Almost 6 months ago, I came across an interesting update through the UT CS theory seminar group on the classic problem on the computation of the Discrete Fourier Transform(DFT) of finite groups. Chris Umans, from Caltech, made some pretty nice improvements to the complexity of the general DFT through some slick combinatorics , which I’ll shortly explain. The paper is here.

# Introduction

Well, before I even start, perhaps, I should make some kind of a comment to motivate the problem at hand. A sequence of complex numbers can be represented by a function .

A Discrete Fourier Transform of sends it to the sequence where .

This motivates the definition of the Fourier transform of a function from a finite group which takes as input a representation , i.e

where one must note that here, more generally, can be matrix-valued.From Maschke’s theorem, the DFT of a finite group reduces to a map which takes as input an element (group ring consisting of all functions from to under convolution)and sends it to . Throughout, one assumes that we have already have the and chosen a basis for each .

Quite simply, one trivially gets that this requires operations. Exponent-one algorithms exist for abelian groups(as one would expect using the decomposition into cyclic groups), supersolvable and certain classes of symmetric and alternating groups.

Throughout the post, let be the exponent of matrix multiplication(currently it is around 2.38 I guess).

The main result is that Chris Umans achieves an operation complexity of . Some basic facts from representation theory will be used throughout though I guess most readers would already be familiar with that. For example, it is easy to prove that through the Schur orthogonality relations is that where the is the set of all irreducible representations of . This gives us the useful lemma.

## Lemma 1:

For every real number , (Proof omitted)

Some facts about the induced representation and Clifford theory are used but it shouldn’t be too much of a hassle to discuss it on-the-fly.

# The Support Trick

Let be a group and , a subset. Let’s assume that we can caluclate the generalized DFT with respect to for all inputs supported on (i.e components for ) in operations. Surely, with some extra computational cost(involving the index of ), one can extend this to all inputs because if we can compute DFT supported on then we can compute DFT supported on by multiplying by which has an additional operation cost of . We can apply Lemma 1,take appropriate number of ‘shifts/translates’ by elements of to get the following result(see Section 2.3):

## Theorem 1

If a generalized DFT can be computed with respect to for all inputs supported on in operations, then the generalized DFT with respect to can be computed for all inputs with operation count

The aim is to use the subgroup structure in an attempt to reduce the operation complexity, which brings us to the main idea.

# Subgroup reduction

The initial results by Beth and Clausen on this topic dealt with the technique of single subgroup reduction. Let be a subgroup. The main idea is that we calculate many DFTs with respect to and lift all these to recursively. Note that any irreducible representation can be obtained as a restriction of an irreducible representation .

To elaborate on this, let be the right coset representatives of in . For any , we can write as

where . Next, we perform DFT with respect to for each to get the quantities:

. We can lift this to as decomposes into a sum of elements of . There is no **extra computational cost**(copying is not counted for some technical reasons) if we **assume** that we have an **basis, i.e a choice of basis** for each is a block diagonal matrix with blocks of sizes equal to the dimensions of the irreducible representations of for all

We wish to calculate which is simply .

There are matrix multiplications of block-diagonal matrices and hence if we denote by the multiplicity of the irrep in for , we get the count of performing one such multiplication to be:

which evaluates to through Frobenius reciprocity, specifically we see that is the dimension of . See Theorem 3 of this paper.

Now, we do of these multiplications followed by taking the sum which involves operations since there are atmost non-zero entries in the block diagonals(note that this follows from sum-of-squares equation in the introduction). So, the total cost is

. Using Lemma 1(see proof below), we get the following theorem

## Theorem 2

Let be a subgroup, then one can compute a DFT with respect to in operation count of DFTs with respect to plus

### Proof of Theorem 2:

We’ve got most of it down. Since , we can use Lemma 1, to get the upper bound . Now $\frac{\omega}{2} >1$, so we the upper bound , this gives as a upper bound on the total cost calculated above.

The remaining problem is that we have not stipulated a adapted basis. We can convert an arbitrary basis to an adapted basis at a cost of $latex \sum_{\rho \in Irr(G)} O(dim(\rho)^{\omega+\epsilon}) \leq O(|G|^{\frac{\omega}{2}+\epsilon}) using Lemma 1. This is the same bound as the one above for the total cost which proves the theorem.

Using this result, it is not too hard(again, see Theorem 5 of the paper) to get the weaker bound for generalized DFT using induction and Lev’s theorem(which guarantees the existence of a subgroup of order atleast )

# Double Subgroup Reduction

If are two subgroups, we reduce the problem of computing the DFT with respect to to computing it with respect to and .

It is quite easy to see that are subgroups of and supported on and given a fixed way of expressing every as (not necessarily unique), then one can compute:

with K-DFTs and H-DFTs.

The extra computation cost is proportional to and (roughly if one assumes is close to ). The details can be found in either of the papers. This can be translates appropriately(through a probabilistic argument) to get the overhead cost(which will naturally include a factor). This prevents the pure exponent that is the ultimate goal. This is dealt with using the new technique of ‘Triple subgroup reduction’ which I’ll discuss shortly. I’ll state the main theorem for the double subgroup reduction method below:

## Theorem 3:

If then the total cost of computing a DFT with respect to is

O((|H| K-DFTs+|K| H-DFTs+

Proof ommited. See section 3(Theorem 12) of this paper by Umans.

So, to reiterate, the H K-DFTs +K H-DFTs part is obtained through the simple procedure sketched out in the start of this section. The outer is obtained through a simple probabilistic argument of sliding -DFTs with respect to coset representatives . The remaining quantity comes from a series of matrix multiplication and change of basis matrices.

Umans obtains the golden exponent for solvable groups and finite groups of Lie type through this technique.

# Triple subgroup reduction

This bottleneck is overcome by considering the case when is normal in . Some structural group-theoretic theorems that I’ll introduce later ensure that this choice is possible. This technique will allow us to get rid of the overhead cost. In the calculation of the intermediate representation needed for the double subgroup reduction method, for the first step, we calculate

for , the coset representatives of the subgroup in . The idea of the triple subgroup reduction is that we can write these as a sum of matrices with the kind of structure that we’ll describe in the main theorem. But before that, I must segue into some facts from representation theory since when we’re dealing with representations involving normal subgroups, Clifford theory is important.

## A Short Interlude

Consider a normal subgroup . Consider and some , then one can obtain a on by conjugating the input, more precisely,

. The thing could naturally be said of the irreducible character. Let be the orbits of this . If , then we can consider which breaks up into a some of irreducibles in .

## Theorem 4(Clifford’s Theorem):

Let . Then,

such that divide . Additionally, note that all the irreducible representations in the orbit have the same dimension and hence

Proof omitted(see wiki)

Some stronger divisibility statements could be made on and the orbit size but it doesn’t really matter in our case.

What is really important is that for is completely determined(upto constant) by the orbit . Thus, we can partition into where corresponds to . Through a simple calculation(see below), we get the following useful lemma.

## Lemma 2:

### Proof of Lemma 2:

By definition of the induced representation . For ,let be the number of times appears in , then by decomposition into irreducibles, we get:

Using Frobenius reciprocity, (number of times appears in ). Notice that this is exactly . The terms in the sum not in the associated orbit become zero and the result follows.

## Inverse DFT

In the notation of Umans, we’ll define the inverse DFT when we’re given a representation . It will take as input, a square matrix of order and return an element such that:

Thankfully, the DFT calculation has a relatively easy computational count from this theorem of Beth-Clausen.

## Lemma 3(Inverse DFT theorem)

If the DFT with respect to can be computed in operations then the inverse DFT with respect to , in the same basis, can be computed in atmost operations.

Proof omitted(see Theorem 2.6 from the paper)

Next, we need a technical lemma regarding matrix multiplication.

## Lemma 4:

Let be a matrix, , a matrix and , a matrix. Then,

where is the -size column vector obtained by traversing the rows of and converts the size column into a matrix.

Proof is omitted. It is slightly fun to work it out though. See Lemma 7 from the paper.

## Theorem 5:

Let and for any , be the associated quantities obtained from Clifford’s theorem. For any (where is a -order square matrix), there exists, with respect to an adapted basis, matrices of size which satisfies:

(note that is row notation,i.e are rows).

where the matrix has size

Moreover, these can be chosen in such that there exist unique labellings for each of the entries of the set of matrices for all such that if

for all (here, represents the entry of the matrix where )

the number of labels(unique by definition) is . Additionally, these can be obtained DFTs with respect to .

### Proof of Theorem 5:

Phew!That’s quite a long statement. The last paragraph of the statement is a kind-of convoluted matrix combinatorics game where we are trying to convert the block-matrix diagonal into something of a simpler form involving a block-matrix diagonal where the blocks are of size (I’ll explain this soon). This will enable us to proceed with the lifting the intermediate DFT to a full DFT. Now, to get back to the proof,

Since we have an adapted basis, the is of the form .We’ll solve for by a sequence of steps:

First, we have

Note that the above span the entire vector space . I leave it to the reader to verify that the equation

**(*)**

can be solved for for all (note that the above hint only gives a solution to the ; one just combines the solutions simultaneously) by an inverse DFT with respect to where the input is the given to us.

Now, to deal with the last paragraph of the statement. The important thing is the number . Assume for now that such a labelling exists(we’ll show what it is after that). I’d like to pass off a few comments as to what is going on before proceeding.

Trivially , we can take to be a unique labelling for every and obviously easily ensure that is completely distinct for all . This would be give a trivial from Lemma 2. The point is that we have to minimize this as we have to perform inverse DFTs(computationally almost as easy as DFT) with respect to . Getting back into the proof,

It is actually simple, if the labelling and are equal, say (by the condition on the labellings, this means that the entry of and are equal where and ), then the associated and from the two equation of the form **(*)** above are distinct! Replace the by the label and simply append the to and get by an inverse DFT with respect to .

Hence, we need only DFTs with respect to to get the matrices for all and if we have such a labelling.

### What’s the labelling:

Consider any random block-diagonal matrix of the form:

block matrices of order

block matrices of order

block matrices of order

block matrices of order

You can let also let the non-zero entries of be distinct if you want to equivalently think of a mapping to actual values instead of positions in a matrix.

Let’s say that every column in this matrix is ‘unoccupied’.

The domain of the labelling is represented as a block diagonal matrix. We define , for every column in the domain of height (number of non-zero entries), we find such that and associate to this column of height , the first unoccupied column of size top-down. Of course, as one can easily see, there might be some elements in the target matrix not associated to anything in the domain. Call this column occupied. How do we know we have enough columns?

There can be atmost columns of height in the the domain since there entries in total(see Lemma 2).

Now, it is easy to check that are atleast columns in of height in .

This entire procedure is just a kind of rearrangement of the domain for efficient lifting to a with respect to . This constructions allows us to define the parent matrix/rearranged matrix of with respect to the labelling as the matrix having same size as , above, but with entries replaced by the respective entry of if there is a associated labelling to that entry and zero otherwise.

# The Finale

Everything is essentially setup now. Don’t worry, we’ve already played the game. The rules have just changed a bit. After some combinatorial hacks, we got what is needed, . Though here, we’ll replacing that by a subgroup

is a subgroup. are subgroups such that . is the set of coset representatives of in and is the set of coset representatives of in . Again, through some structural group-theoretic results,we’ll later see that is negligible in a certain sense.

Through some simple arguments(strongly recommend seeing Lemma 3.7 in the paper), we see that if is supported on , consider the calculation

where is the parent matrix associated to which satisfy

with respect to an adapted basis for .

Then, the above calculation can computed with DFTs with respect to + inverse DFTs with respect to + DFTs with respect to .

Now, it remains to see how one can lift this to a DFT with respect to . Let , be the change of basis matrices corresponding to ,, i.e

for all where is the set of irreducible representations of that appear in the restrictions of the irreducible representations of , counted with multiplicity.

Similarly, for . We also require that the basis change is one which is adapted with respect to . Also, for , we have

for all .

**(1)**

Consider supported on then we can split up the DFT on as follows:

which can be rewritten using as:

By the discussion above and using equation (1) above, we can further rewrite this as:

Consider the quantity which is in the sum above. Using Lemma 4, calculating is equivalent to calculating:

There are repetitions in the tensor product structure(both sides) of the on the left hand side and this can be appropriately changed by shifting the entries of to a block structure(for more info, see section 3.3 of the paper). So, finally, calculating is equivalent to calculating,

where is a block-diagonal matrix whose entries are sums of entries in

Let be the parent matrix of . So, in the expression above, we can replace by and see that we have to calculate:

where has the ‘same data'(see the paper for more info;honestly, I am not sure how to easily express it;the less one says of these matrix combinatorics, maybe the better) as . We’ve rearranged all the above into the parent matrix and so we’d have to appropriately move around the elements of . Using the earlier facts of representation theory, namely, and that the blocks in the parent matrix have blocks of orders of the form such that the the sum of squares of the block sizes(not necessarily total number of non-zero elements) is . So, the square blocks of have dimensions which satisfy