Almost 6 months ago, I came across an interesting update through the UT CS theory seminar group on the classic problem on the computation of the Discrete Fourier Transform(DFT) of finite groups. Chris Umans, from Caltech, made some pretty nice improvements to the complexity of the general DFT through some slick combinatorics , which I’ll shortly explain. The paper is here.
Introduction
Well, before I even start, perhaps, I should make some kind of a comment to motivate the problem at hand. A sequence of complex numbers
can be represented by a function
.
A Discrete Fourier Transform of sends it to the sequence
where
.
This motivates the definition of the Fourier transform of a function from a finite group
which takes as input a representation
, i.e
where one must note that here, more generally, can be matrix-valued.From Maschke’s theorem, the DFT of a finite group reduces to a map which takes as input an element
(group ring consisting of all functions from
to
under convolution)and sends it to
. Throughout, one assumes that we have already have the
and chosen a basis for each
.
Quite simply, one trivially gets that this requires operations. Exponent-one algorithms exist for abelian groups(as one would expect using the decomposition into cyclic groups), supersolvable and certain classes of symmetric and alternating groups.
Throughout the post, let be the exponent of matrix multiplication(currently it is around 2.38 I guess).
The main result is that Chris Umans achieves an operation complexity of . Some basic facts from representation theory will be used throughout though I guess most readers would already be familiar with that. For example, it is easy to prove that through the Schur orthogonality relations is that
where the
is the set of all irreducible representations of
. This gives us the useful lemma.
Lemma 1:
For every real number ,
(Proof omitted)
Some facts about the induced representation and Clifford theory are used but it shouldn’t be too much of a hassle to discuss it on-the-fly.
The Support Trick
Let be a group and
, a subset. Let’s assume that we can caluclate the generalized DFT with respect to
for all inputs
supported on
(i.e components
for
) in
operations. Surely, with some extra computational cost(involving the index of
), one can extend this to all inputs because if we can compute DFT supported on
then we can compute DFT supported on
by multiplying
by
which has an additional operation cost of
. We can apply Lemma 1,take appropriate number of ‘shifts/translates’ by elements of
to get the following result(see Section 2.3):
Theorem 1
If a generalized DFT can be computed with respect to for all inputs supported on
in
operations, then the generalized DFT with respect to
can be computed for all inputs with operation count
The aim is to use the subgroup structure in an attempt to reduce the operation complexity, which brings us to the main idea.
Subgroup reduction
The initial results by Beth and Clausen on this topic dealt with the technique of single subgroup reduction. Let be a subgroup. The main idea is that we calculate
many DFTs with respect to
and lift all these to
recursively. Note that any irreducible representation
can be obtained as a restriction of an irreducible representation
.
To elaborate on this, let be the right coset representatives of
in
. For any
, we can write
as
where
. Next, we perform DFT with respect to
for each
to get the quantities:
. We can lift this to
as
decomposes into a sum of elements of
. There is no extra computational cost(copying is not counted for some technical reasons) if we assume that we have an
basis, i.e a choice of basis for each
is a block diagonal matrix with blocks of sizes equal to the dimensions of the irreducible representations of
for all
We wish to calculate which is simply
.
There are matrix multiplications of block-diagonal matrices and hence if we denote by
the multiplicity of the irrep
in
for
, we get the count of performing one such multiplication to be:
which evaluates to
through Frobenius reciprocity, specifically we see that
is the dimension of
. See Theorem 3 of this paper.
Now, we do of these multiplications followed by taking the sum which involves
operations since there are atmost
non-zero entries in the block diagonals(note that this follows from sum-of-squares equation in the introduction). So, the total cost is
. Using Lemma 1(see proof below), we get the following theorem
Theorem 2
Let be a subgroup, then one can compute a DFT with respect to
in operation count of
DFTs with respect to
plus
Proof of Theorem 2:
We’ve got most of it down. Since , we can use Lemma 1, to get the upper bound
. Now $\frac{\omega}{2} >1$, so we the upper bound
, this gives as a
upper bound on the total cost calculated above.
The remaining problem is that we have not stipulated a adapted basis. We can convert an arbitrary basis to an
adapted basis at a cost of $latex \sum_{\rho \in Irr(G)} O(dim(\rho)^{\omega+\epsilon}) \leq O(|G|^{\frac{\omega}{2}+\epsilon}) using Lemma 1. This is the same bound as the one above for the total cost which proves the theorem.
Using this result, it is not too hard(again, see Theorem 5 of the paper) to get the weaker bound for generalized DFT using induction and Lev’s theorem(which guarantees the existence of a subgroup of order atleast
)
Double Subgroup Reduction
If are two subgroups, we reduce the problem of computing the DFT with respect to
to computing it with respect to
and
.
It is quite easy to see that are subgroups of
and
supported on
and given a fixed way of expressing every
as
(not necessarily unique), then one can compute:
with
K-DFTs and
H-DFTs.
The extra computation cost is proportional to and
(roughly if one assumes
is close to
). The details can be found in either of the papers. This
can be translates appropriately(through a probabilistic argument) to get the overhead cost(which will naturally include a
factor). This prevents the pure
exponent that is the ultimate goal. This is dealt with using the new technique of ‘Triple subgroup reduction’ which I’ll discuss shortly. I’ll state the main theorem for the double subgroup reduction method below:
Theorem 3:
If then the total cost of computing a DFT with respect to
is
O((|H| K-DFTs+|K| H-DFTs+
Proof ommited. See section 3(Theorem 12) of this paper by Umans.
So, to reiterate, the H K-DFTs +K H-DFTs part is obtained through the simple procedure sketched out in the start of this section. The outer is obtained through a simple probabilistic argument of sliding
-DFTs with respect to coset representatives
. The remaining quantity comes from a series of matrix multiplication and change of basis matrices.
Umans obtains the golden exponent for solvable groups and finite groups of Lie type through this technique.
Triple subgroup reduction
This bottleneck is overcome by considering the case when is normal in
. Some structural group-theoretic theorems that I’ll introduce later ensure that this choice is possible. This technique will allow us to get rid of the overhead cost. In the calculation of the intermediate representation
needed for the double subgroup reduction method, for the first step, we calculate
for
, the coset representatives of the subgroup
in
. The idea of the triple subgroup reduction is that we can write these
as a sum of
matrices with the kind of structure that we’ll describe in the main theorem. But before that, I must segue into some facts from representation theory since when we’re dealing with representations involving normal subgroups, Clifford theory is important.
A Short Interlude
Consider a normal subgroup . Consider
and some
, then one can obtain a
on
by conjugating the input, more precisely,
. The thing could naturally be said of the irreducible character. Let
be the orbits of this
. If
, then we can consider
which breaks up into a some of irreducibles in
.
Theorem 4(Clifford’s Theorem):
Let . Then,
such that
divide
. Additionally, note that all the irreducible representations in the orbit have the same dimension
and hence
Proof omitted(see wiki)
Some stronger divisibility statements could be made on and the orbit size
but it doesn’t really matter in our case.
What is really important is that for is completely determined(upto constant) by the orbit
. Thus, we can partition
into
where
corresponds to
. Through a simple calculation(see below), we get the following useful lemma.
Lemma 2:
Proof of Lemma 2:
By definition of the induced representation . For
,let
be the number of times
appears in
, then by decomposition into irreducibles, we get:
Using Frobenius reciprocity, (number of times
appears in
). Notice that this
is exactly
. The terms in the sum not in the associated orbit become zero and the result follows.
Inverse DFT
In the notation of Umans, we’ll define the inverse DFT when we’re given a representation . It will take as input, a square matrix
of order
and return an element
such that:
Thankfully, the DFT calculation has a relatively easy computational count from this theorem of Beth-Clausen.
Lemma 3(Inverse DFT theorem)
If the DFT with respect to can be computed in
operations then the inverse DFT with respect to
, in the same basis, can be computed in atmost
operations.
Proof omitted(see Theorem 2.6 from the paper)
Next, we need a technical lemma regarding matrix multiplication.
Lemma 4:
Let be a
matrix,
, a
matrix and
, a
matrix. Then,
where
is the
-size column vector obtained by traversing the rows of
and
converts the
size column into a
matrix.
Proof is omitted. It is slightly fun to work it out though. See Lemma 7 from the paper.
Theorem 5:
Let and for any
,
be the associated quantities obtained from Clifford’s theorem. For any
(where
is a
-order square matrix), there exists, with respect to an
adapted basis, matrices
of size
which satisfies:
(note that
is row notation,i.e
are rows).
where the matrix has size
Moreover, these can be chosen in such that there exist unique labellings
for each
of the entries of the set of matrices
for all
such that if
for all
(here,
represents the
entry of the matrix
where
)
the number of labels(unique by definition) is . Additionally, these
can be obtained
DFTs with respect to
.
Proof of Theorem 5:
Phew!That’s quite a long statement. The last paragraph of the statement is a kind-of convoluted matrix combinatorics game where we are trying to convert the block-matrix diagonal into something of a simpler form involving a block-matrix diagonal where the blocks are of size
(I’ll explain this soon). This will enable us to proceed with the lifting the intermediate DFT to a full DFT. Now, to get back to the proof,
Since we have an adapted basis, the
is of the form
.We’ll solve for
by a sequence of steps:
First, we have
Note that the above span the entire vector space
. I leave it to the reader to verify that the equation
(*)
can be solved for for all
(note that the above hint only gives a solution to the
; one just combines the solutions simultaneously) by an inverse DFT with respect to
where the input is the
given to us.
Now, to deal with the last paragraph of the statement. The important thing is the number . Assume for now that such a labelling exists(we’ll show what it is after that). I’d like to pass off a few comments as to what is going on before proceeding.
Trivially , we can take to be a unique labelling for every
and obviously easily ensure that
is completely distinct for all
. This would be give a trivial
from Lemma 2. The point is that we have to minimize this
as we have to perform
inverse DFTs(computationally almost as easy as DFT) with respect to
. Getting back into the proof,
It is actually simple, if the labelling and
are equal, say
(by the condition on the labellings, this means that the
entry of
and
are equal where
and
), then the associated
and
from the two equation of the form (*) above are distinct! Replace the
by the label
and simply append the
to
and get
by an inverse DFT with respect to
.
Hence, we need only DFTs with respect to
to get the matrices
for all
and
if we have such a labelling.
What’s the labelling:
Consider any random block-diagonal matrix of the form:
block matrices of order
block matrices of order
block matrices of order
block matrices of order
You can let also let the non-zero entries of be distinct if you want to equivalently think of a mapping to actual values instead of positions in a matrix.
Let’s say that every column in this matrix is ‘unoccupied’.
The domain of the labelling is
represented as a block diagonal matrix. We define
, for every column in the domain of height
(number of non-zero entries), we find
such that
and associate to this column of height
, the first unoccupied column of size
top-down. Of course, as one can easily see, there might be some elements in the target matrix
not associated to anything in the domain. Call this column occupied. How do we know we have enough columns?
There can be atmost columns of height
in the the domain since there
entries in total(see Lemma 2).
Now, it is easy to check that are atleast columns in
of height
in
.
This entire procedure is just a kind of rearrangement of the domain for efficient lifting to a with respect to
. This constructions allows us to define the parent matrix/rearranged matrix of
with respect to the labelling
as the matrix having same size as
, above, but with entries replaced by the respective entry of
if there is a associated labelling to that entry and zero otherwise.
The Finale
Everything is essentially setup now. Don’t worry, we’ve already played the game. The rules have just changed a bit. After some combinatorial hacks, we got what is needed, . Though here, we’ll replacing that
by a subgroup
is a subgroup.
are subgroups such that
.
is the set of coset representatives of
in
and
is the set of coset representatives of
in
. Again, through some structural group-theoretic results,we’ll later see that
is negligible in a certain sense.
Through some simple arguments(strongly recommend seeing Lemma 3.7 in the paper), we see that if is supported on
, consider the calculation
where
is the parent matrix associated to
which satisfy
with respect to an adapted basis for
.
Then, the above calculation can computed with DFTs with respect to
+
inverse DFTs with respect to
+
DFTs with respect to
.
Now, it remains to see how one can lift this to a DFT with respect to . Let
,
be the change of basis matrices corresponding to
,
, i.e
for all
where
is the set of irreducible representations of
that appear in the restrictions of the irreducible representations of
, counted with multiplicity.
Similarly, for . We also require that the basis change is one which is
adapted with respect to
. Also, for
, we have
for all
.
(1)
Consider supported on
then we can split up the
DFT on
as follows:
which can be rewritten using as:
By the discussion above and using equation (1) above, we can further rewrite this as:
Consider the quantity which is in the sum above. Using Lemma 4, calculating
is equivalent to calculating:
There are repetitions in the tensor product structure(both sides) of the on the left hand side and this can be appropriately changed by shifting the entries of
to a block structure(for more info, see section 3.3 of the paper). So, finally, calculating
is equivalent to calculating,
where is a block-diagonal matrix whose entries are sums of entries in
Let be the parent matrix of
. So, in the expression above, we can replace
by
and see that we have to calculate:
where
has the ‘same data'(see the paper for more info;honestly, I am not sure how to easily express it;the less one says of these matrix combinatorics, maybe the better) as
. We’ve rearranged all the
above into the parent matrix and so we’d have to appropriately move around the elements of
. Using the earlier facts of representation theory, namely,
and that the blocks in the parent matrix have blocks of orders of the form
such that the the sum of squares of the block sizes(not necessarily total number of non-zero elements) is
. So, the square blocks of
have dimensions
which satisfy