Fast generalized DFT for finite groups

Almost 6 months ago, I came across an interesting update through the UT CS theory seminar group on the classic problem on the computation of the Discrete Fourier Transform(DFT) of finite groups. Chris Umans, from Caltech, made some pretty nice improvements to the complexity of the general DFT through some slick combinatorics , which I’ll shortly explain. The paper is here.

Introduction

Well, before I even start, perhaps, I should make some kind of a comment to motivate the problem at hand. A sequence of $n$ complex numbers $(x_{i})$ can be represented by a function $f:\mathbb{Z}_{n} \to \mathbb{C}$.

A Discrete Fourier Transform of $(x_{i})$ sends it to the sequence $(F_{i})$ where $F_{i}=\sum\limits_{j=0}^{n-1}x_{j}e^{\frac{-i2\pi k j}{n}}$.

This motivates the definition of the Fourier transform $\hat{f}$ of a function from a finite group $f : G \to \mathbb{C}$ which takes as input a representation $\rho:G \to GL(d_{\rho},\mathbb{C})$, i.e

$\hat{f}(\rho)=\sum\limits_{g \in G} f(g)\rho(g)$

where one must note that here, more generally, $\rho(g)$ can be matrix-valued.From Maschke’s theorem, the DFT of a finite group reduces to a map which takes as input an element $\alpha \in latex \mathbb{C}[G]$(group ring consisting of all functions from $G$ to $\mathbb{C}$ under convolution)and sends it to $\sum\limits_{g} \alpha(g) \bigoplus_{\rho \in Irr(G)}\rho(g)$. Throughout, one assumes that we have already have the $\rho$ and chosen a basis for each $\rho$.

Quite simply, one trivially gets that this requires $O(|G|^{2})$ operations. Exponent-one algorithms exist for abelian groups(as one would expect using the decomposition into cyclic groups), supersolvable and certain classes of symmetric and alternating groups.

Throughout the post, let $\omega$ be the exponent of matrix multiplication(currently it is around 2.38 I guess).

The main result is that Chris Umans achieves an operation complexity of $O(|G|^{\frac{\omega}{2}}$. Some basic facts from representation theory will be used throughout though I guess most readers would already be familiar with that. For example, it is easy to prove that through the Schur orthogonality relations is that $\sum\limits_{i} |V_{i}|^{2}=|G|$ where the $V_{i}$ is the set of all irreducible representations of $G$. This gives us the useful lemma.

Lemma 1:

For every real number $\alpha \geq 2$, $\sum\limits_{\rho \in Irr(G)} dim(\rho)^{\alpha} \leq |G|^{\frac{\alpha}{2}}$(Proof omitted)

Some facts about the induced representation and Clifford theory are used but it shouldn’t be too much of a hassle to discuss it on-the-fly.

The Support Trick

Let $G$ be a group and $S$, a subset. Let’s assume that we can caluclate the generalized DFT with respect to $G$ for all inputs $\alpha \in \mathbb{C}[G]$ supported on $S$(i.e components $\alpha_{g}=0$ for $g \not\in S$) in $m$ operations. Surely, with some extra computational cost(involving the index of $S$), one can extend this to all inputs because if we can compute DFT supported on $S$ then we can compute DFT supported on $Sg'$ by multiplying  $\sum\limits_{g} \alpha(g) \bigoplus_{\rho \in Irr(G)}\rho(g)$ by $\bigoplus_{\rho \in Irr(G)} \rho(g')$ which has an additional operation cost of $\sum\limits_{\rho \in Irr(G)} O(dim(\rho)^{\omega+\epsilon}$. We can apply Lemma 1,take appropriate number of ‘shifts/translates’ by elements of $G$ to get the following result(see Section 2.3):

This series(probably three posts) will be on the monad. The first part will deal with only an introduction to monads, algebras over moands and the adjunction problem . The second part will deal with coequalizers and Beck’s theorem in connection with Kleisli categories. In keeping with this blog’s slight CS tilt, the third part will deal with examples of monads from functional programming which I think are crucial to wholly understand the monad. I’ve noticed that I am running too many ‘series posts’ simultaneously. I am still working on the third part of the Grothendieck Spectral Sequence series and will continue the Hopf fibration post which I haven’t updated for a year!

I think that before I continue to introduce the main players of today’s post, I should review the definitions of an adjoint functor as it’ll be quite useful as a tool to see the power of monads.

Consider two categories $C,D$ and functors $F:D \to C$ and $G:C \to D$. $F,G$ are said to be adjoint i.e $F \dashv G$ if there exists a natural isomorphism of Hom-sets $Hom-{C}(Fd,c) \simeq Hom_{D}(d,Gc)$ for all $d \in D,c \in C$. Equivalently, by the unit=counit definition, if there exists the pair of natural transformations $\eta:1_{D} \Rightarrow GF$(unit) and $\epsilon:FG \Rightarrow 1_{C}$(counit) which satisfy the following commutative diagrams:

Throughout the post, I’ll represent natural transformations by the symbol $\Rightarrow$. I recommend the confused reader to refer to Aluffi’s Algebra Chapter 0 book in the section on category theory and also my post on the importance of adjoint functors.

If $\eta:F \Rightarrow G$ is a natural transformation of functors $F,G: A \to B$ whose components are given by $\eta_{X}:F(X) \to G(X)$ for an object $X$ in $A$. If $T:B \to C$ is another functor, then I’ll represent the components of the induced natural transformation $HF \Rightarrow HG$ by $(H\eta)_{X}=H(\eta_{X})$.

If instead, there is a functor $T:Z \to A$, then I’ll represent the components of the natural transformation $\eta Z:ZF \to ZG$ by $(\eta Z)_{X}=\eta_{Z(X)}$ where $X$ is an object in $Z$.

Let’s say that $T:C \to C$ is an endofunctor equppied with two natural transformations $\eta:1_{C} \Rightarrow T$(unit) and $\mu:TT \Rightarrow T$ such that the following diagrams commute:

Proof of the Sensitivity Conjecture without Cauchy’s Interlacing Theorem

If you aren’t aware, one month ago, the Sensitivity Conjecture, a 30-year old problem, was proven by Hao Hung, an assistant professor at Emory University in just a little more than 2 pages using particularly simple methods albeit cleverly implemented. The proof is very easy to understand and requires nothing more than basic linear algebra. You can find it here.

A few days ago, Donald Knuth simplified the already simple proof of H. Hung and could fit all his arguments in half a page. What really shocked me when I heard the news is that Knuth is still alive. I could have sworn that it was only a few days ago when I was skimming through a textbook on Discrete Mathematics and saw a bio with a black-and-white pic of him in one of the chapters related to computational complexity in the same manner that I’d see pictures and mini-biographies of other 20th century giants such as Grothendieck and Nash relegated to the margins of math textbooks and online articles romantically detailing the course of their fabled lives.

Now that I’ve realized that he’s well and alive, it shocks me equally to learn that at the age of 81, he is still able to contribute to research.

I’m not exactly going to regurgitate Knuth’s proof but what I’ll present certainly uses the same ideas. I must note here than Knuth’s proof itself is inspired by a comment on Scott Aronson’s blog which provided a method to prove the conjecture without using Cauchy’s Interlacing Theorem.

If you don’t know what the Sensitivity Conjecture, I’ll state it below.

If $f:\{0,1\}^{n} \mapsto \{0,1 \}$ is a boolean function where the domain is the graph of the $n-dimensional$ cube denoted by $Q^{n}=\{0,1 \}^{n}$. So, a particular input $x$ is just a string of 0s and 1s. The local sensitivity of $f$ at an input $x$ is the number of indices in the ‘string’ of $x$ that you can flip and not change the output. The local block sensitivity of $f$ at input $x$ is the number of disjoint subsets of the set of indices $\{0,1,2 \cdots, n \}$ such that the output doesn’t change when every index corresponding to a block flips in the input string of $x$. The sensitivity and block sensitivity are defined to be the maximum of these corresponding measures over all the inputs.

Topological Constraints on the Universal Approximation of Neural Networks

The title may seem like a contradiction given that there is such a thing as the Universal Approximation Theorem which simply states that a neural network with a single hidden layer of finite width(i.e finite number of neurons) can approximate any function on a compact set of $\mathbb{R}^{n}$ given that the activation function is non-constant,bounded and continuous.

Needless to say, I haven’t found any kind of flaws in the existing proofs(see Kolmogrov or Cybenko). However, I thought of something a little simple and scoured the internet for an answer.

What if we allow an arbitrary number of hidden layers and bound the dimension of the hidden layers making them ‘Deep, Skinny Neural Networks’? Would that be a universal approximator?