A Note on Quantum Approximate Counting and Random Musings

Ok, today we’ll tackle something that I’ve never talked about in my blog before:Quantum computing.There might be a few more quantum computing posts in the future if I am in the mood for it.

The problem at hand is simple: There is an unordered list of N items of which K are marked;Approximate the value of K. Throughout, we’ll represent the N items by [N] and the K items by [K].

This is actually a rather old problem which has been tackled before, and is deeply connected to amplitude estimation(see [1]) which uses uses the Quantum Fourier transform. Recently though, a simpler proof which uses only Grover-like techniques was published by Scott Aaronson, a professor at UT Austin(Hook em’ Horns!). Note that the algorithm is probabilistic with no bounds.

The heart of the technique is Grover’s algorithm so perhaps I should spend a few words on that. Consider an unordered list of size N where you want to find a particular element which can be checked to be a solution in constant time, then this can be found in O(\sqrt{N}).

In the classical case, one is forced to do this(both the above problem and QAC) in O(n) time. The full strength of Grover’s algorithm is that one can actually find the pre-image f^{-1}(b) for f:[N] \to \{0,1\} where f maps x to 1% if and only if latex x$ is a solution to the given problem. This is all encoded as a quantum black-box unitary operator U defined by U\lvert x \rangle =\lvert x \rangle if f(x)=0 and U\lvert x \rangle=-\lvert x \rangle if f(x)=1.

The idea of the algorithm is that we spit out the the superposition of all states:

\lvert s \rangle =\frac{1}{\sqrt{N}} \sum\limits_{i=1}^{N} \lvert x \rangle

followed by applying the operator U above which selects the marked item by flipping it.

and then apply the diffusion operator :

D=2 \lvert s \rangle \langle s \rvert-I. This, in effect, flips the marked qubit around the mean, selectively increasing its relative amplitude.

Explicitly, the mapping of the diffusion gate is:

\sum\limits_{x \in \{0,1\}^{n}}\alpha_{x}\lvert x \rangle  \mapsto \sum\limits_{x \in \{0,1\}^{n}}(2\mu-\alpha_{x}) where \mu is the mean of the \lvert x \rangle.

One repeats this a certain number of times(specifically O(\sqrt{N}) and then measures, obtaining the corresponding eigenvalue with high probability.

A Random Aside?

In the case of a non-local hidden-variable quantum computer, one can actually do this in atmost O(\sqrt[3]{N}) time. I’m not really going to talk about this. However, this spawned a simple question:What if you could perform it in O(log(N)) time? What would that actually imply?

Consider the SAT problem: There are variables x_{1},\cdots,x_{n} and expressions in these variables $l_{1},\cdots,l_{k}$. Find an assignment of either true/false to each x_{i} such that all the expressions l_{p} evaluate to true.,i.e l_{1} \land l_{2} \land \cdots \land l_{k} is true.

Anyways, we can store all O(2^{n}) possibilities and search through them in O(log(2^{n}))=O(n) time in our hypothetical case, which is linear. This would mean that confirmed NP decision problems like SAT, Hamiltonian path problem can be solved in polynomial time which implies P=NP. So, it is unlikely that this is the case.

I’ll get back to the these things in the end of this post.

Quantum Approximate Counting

Consider an unordered collection of size N where K of them is marked. Estimate the value of K. This is the problem at hand.

Firstly, let’s clarify the exact problem statement. We wish to find an approxiamte solution, so that means we select multiplicative error \epsilon, i.e find $\tilde{K}$ such that

1-\epsilon \leq \frac{\tilde{K}}{K} \leq 1+\epsilon

Moreover, the algorithm is supposed to be probabilistic so we also select a small probability of failure $\delta$.

Let’s first review some obvious solutions before getting to the actual algorithm. One must note that in the case of a quantum solution, we only care about how many query calls we make,i.e the query call complexity. The real difference in these cases is that in a quantum computer, we can measure observables and get a probability distribution of measurements, so we want to make clever choices for the observables that we measure to extract useful information, the same philosophy behind Grover’s algorithm. As such, it seems reasonable that in a good solution, the query call complexity depends somehow on the value K itself.

Classical solution

To exactly find K, the only choice is a linear search which is O(n). Actually, the approximate case(deterministic) could be done in O(\frac{1}{\epsilon^{2}}\frac{N}{K}).

‘Easy’ quantum solution

Use Grover’s algorithm where f maps the input to 1 if and only if it is marked, this would be O(\sqrt{N}).

Query call complexity of the ideal solution

Perhaps, it is best if I reveal the complexity before moving on. The ideal ideal query call complexity to obtain \tilde{K} which \epsilon-approximates K with high probability of success 1-\delta is O(\frac{1}{\epsilon} \sqrt{\frac{N}{K}} log(\frac{1}{\delta})).

Initial steps

Instead of approximating K, we equivalently approxiamte \theta = arcsin(\frac{N}{K}).

There are really two steps:

First, we find a good constant-factor approximation to K and then we cleverly shave off the bounds to get an \epsilon -approximation.

So, first we obtain some interval [\theta_{min}^{t+1},\theta_{max}^{t-1}] containing \theta.

This is explained in part 1 of Aaronson’s overview of the algorithm:

Here ,\phi is the uniform superposition of all states and G is the diffusion operator

Note that :

G^{\frac{r-1}{2}}\lvert \psi \rangle=\frac{sin(r\theta)}{\sqrt{K}} \sum\limits_{x \in [K] } \lvert x \rangle +\frac{cos(r\theta)}{\sqrt{K}} \sum\limits_{x \not\in [K]} \lvert x \rangle. In the computational basis, the probability of measuring a marked item is sin^{2}(r\theta). Step 1 essentially returns a constant factor approxiamtion to \theta withhigh probability. This can be proven using Chernoff bounds which I’ll omit from the discussion.

It is also proven in page 5 that with high probability, one will not find enough marked items to finish step 1 before t<t_{0}. Also, with high probability, we’ll see enough marked items in t=t_{0},t_{0}+1.The interesting part is how to get an \epsilon-approximation.

Final solution

Now, that we have \theta_{min},\theta_{max}, we use the following clever result on page 9:

Let 0<\theta_{min}\leq \theta\leq \theta_{max} \leq \frac{\pi}{1000} and latex \theta_{max}=\theta_{min}(1+\gamma)$ where \gamma \leq \frac{1}{5}. There exists an integer r such that the following is true:

Consider tossing a biased coin which is heads with probability sin^{2}(r\theta) at least 1000.ln(\frac{1}{\epsilon}) times.

If more heads are observed, set \theta_{min}:=\frac{\theta_{max}}{1+0.9 \gamma} else set


This process fails to maintain \theta_{min} \leq \theta \leq \theta_{max} with low probability \gamma. The value r also has some non-trivial bounds.

The probability of getting heads above is exactly the probability of getting a marked item in G^{\frac{r-1}{2}}\lvert \psi \rangle above.

The upshot is that we can distinguish \theta_{min},\theta_{max} by measurement by considering instead r\theta_{min},r\theta_{max} which are nearly orthogonal. Based on our results of measurement, we shave-off either \theta_{min} or \theta_{max} by a factor of 0.9 getting closer and closer(with high probability) to our required \epsilon-estimate of \theta.

Philosophical musings on Energy and Algorithms

I beg the reader to ignore everything here and close this tab immediately as I have no idea what I’m saying(not that I ever do).

Recently, I came across a few interesting thins=gs:

One was an experimental method to tackle classic difficult computer science problems. One was using chemical reactions with DNA molecules to solve the Hamiltonian path problem which is NP-hard. The problem was that it requires an amount factorial in the number of vertices to work. Another solution uses an optical contraption of cables and beam splitters to get a solution. Again, the drawback is that amount of required energy is exponential in the number of vertices. There is another example of some kind of a bizarre mold which rearranges itself in a manner to give an approximate solution to the Travelling Salesman Problem.

In fact, there is an entire field of DNA computing. The aim is to use parallel computing to optimize our solutions. The drawback is that the processing speed is slow.

Again, even for Grover’s algorithm, note that we must first ‘store’/obtain the entire quantum state itself unlike a classic linear-search where the auxilliry space needed is 1. Within a certain degree(spanning all these systems of computing), it seems that we can, to SOME degree, substitute time for space/energy.An obvious thing to note is that you can’t decide anything about some new data if you haven’t even stored it. More often than not, we are willing to make this sacrifice because we care more for efficiency.

Parallel computing speedups require increasing the number of components. Of course, there is Amdahl’s law which puts some kind of a restriction to this.

Aaronson talked about this in one of his informal talks(I think an interview or TedTalk).

The study of complexity theory really may reveal things about physics, nature and energy it seems. In classical computing, there is a limit to how much we can trade space for time so we consider other unconventional types of computing where the threshold for this tradeoff is higher. This is truly an interesting direction to think in.


Bousfield Localization of Spectra

This post is organized as a part of the UT Austin Spring 2020 DRP program.I’d like to thank my mentor Richard Wong for useful directions throughout the semester and introducing me to much of the foundational material in homotopy theory.

This post will be organized as follows:

First, I’ll present the basic definitions of the topic(spectra, homotopy groups,CW-spectra,triangulated categories etc).

Second, I’;ll introduce finite spectra, Z_{p}-localization of spectra aka. p-local spectra via Bousfield localization, thick subcategories.

Third, I’ll discuss the the Thick Subcategory Theorem but avoid technical details.

So, let’s begin.

A spectrum X is a sequence of pointed spaces \{X_{n} \} and structure maps \sigma_{n}: \sum X_{n} \to X_{n+1}. For example, one can take a pointed space X and consider the suspensions \sum^{n} X=S^{n} \wedge X. This forms a spectrum with the structure maps the identity. This is the suspension spectrum of X.

One can consider this general spectrum \{X_{n} \} and suspend it by the integer $i$ to get a spectrum \sum^{i} X whose components are given by (\sum^{i} X)_{n}=X_{n+i}.

The homotopy groups \pi_{k}(X) of a spectrum X are defined as the colimit:

\pi_{k}(X)=colim_{n} \pi_{n+k}(X_{n}).


Since \sum is a functor on Top, one is motivated to define maps between spectra f:X \to Y as a a collection of maps f_{n}:X_{n} \to Y_{n} such that the obvious diagrams commute.This may not actually be the best definition of maps between spectra but it doesn’t matter too much for now.

For example, one can consider S^{0} and suspend this to get \mathbb{S}, the sphere spectrum, an example of a suspension spectrum.

Given the suspension and loop space functors \sum, \Omega, one has the loop-space adjunction for pointed topological spaces, namely,

[\sum X,Y] \simeq [X,\Omega Y]

Note that \Omega Y=[S^{1},Y](with compact-open toplogy). This allows us to define \Omega-spectrum as spectrum X where the adjoint map \tilde{\sigma_{n}}:X_{n} \to \Omega X_{n+1} is a weak homotopy equivalence.

Let’s look at an example.

Eilenberg-MacLane Spectrum

Recall that an Eilenberg-Maclane space corresponding to (G,n) where G is an abelian group is a CW-complex X such that \pi_{n}(X)=G and \pi_{k}(X)=0 for k \neq n.Call this CW-complex K(G,n)

We know the following correspondence from algebraic topology : For any CW-complex X,H^{k}(X;G) \simeq [X,K(G,n)].

Collect these spaces on the index n to get \{K(G,n) \}. We claim this forms a spectra. What are the structure maps though?

In particular, note that we we have \pi_{k}(\Omega K(G,n+1))=\pi_{k+1}(K(G,n+1)). By uniqueness of the Eilemberg-McLane space upto homotopy equivalence, we have a homotopy equivalence \tilde{\sigma}_{n}:K(G,n) \to \Omega K(G,n+1). Take the adjoint of this under the loopspace-adjunction to get the structure maps \sigma_{n}:\sum K(G,n) \to K(G,n+1).

Hence, this is also an \Omega-spectrum.


Well, I suppose some analogy with the classical case is required here. Consider the Quillen-Serre model structure on pointed topological spaces(aka the usual one). Here, the cofibrations are retracts of generalized relative CW inclusions(see [1] for definitions). In particular, the CW complexes are cofibrant objects in this case. We’d like our so called CW complexed to be cofibrant objects in the stable model category on topological sequential spectra(these are the same things as the spectra defined above where \sum X =S^{1} \wedge X).

A CW spectra is a sequence \{E_{n} \} of CW-complexes where the structure maps \sigma_{n}:\sum E_{n} \to E_{n+1} map isomorphically into a subcomplex of E_{n+1}. TO prove the above fact, see Prop 0.49 of this.

Continue reading “Bousfield Localization of Spectra”


Ambedkar, The Unsung Hero of India

Edit: This was supposed to be published on 14/04/2020 but due to some issues, it was published on 16/04/2020.

If you’re wondering what in the world a political post is doing in a mathematics blog, then don’t. Ambedkar’s experiences and pervasive effect on society merits far greater attention than any fancy mathematical abstraction or theorem I could slap unto the front page of my blog. Today is Ambekar Jayanthi, a day in remembrance of B.R Ambedkar and I could hardly resist myself in making some kind of a comment on how inspiring his views have been in shaping my worldview and political inclinations. Perhaps, the first question to ask is: Who is Ambedkar? I’ll give a very brief introduction to this legendary character but I mostly assume that anyone who takes interest in this post has some idea about him. I wanted to discuss in some depth his interactions with other leading figures of the time like Nehru, Gandhi and Jinnah(the ideological founder of Pakistan), his thoughts on Indian Independence, Dalit representation and the ‘Islamic Problem’, and the contemporary image of his legacy.

Some may find it strange that I should call him an ‘unsung hero’ as every soul in the country knows his name yet I stick by it, for his true importance is woefully understated, both today and in the past.

I am neither a political expert nor a historian so to some extent, I absolve myself from any frivolous errors I might make in the post. Needless to say, this is not a biography; I will state my opinions clearly on Ambedkar’s ideas and interactions. I am not being promoted or paid, in any measure, in publishing this post. He is, like most important characters in the Indian Independence episode, a controversial figure, especially his scathing rebuke of Gandhi and Islam.


Bhimrao Ramji Ambedkar(भीमराव रामजी आंबेडकर)(1891-1956) was born in Mhow, currently in Madhya Pradesh, into a low-middle class family(economic) from the Mara caste, one of the lowest caste denominations in Maharashtra. Quite early in his life, he moved back to Maharashtra. He is mainly remembered as one of the foremost Dalit(untouchables in the caste hierarchy) activists of his era and a key founder of the Indian Constitution. He was also a highly reputed economist, obtaining a doctorate degree at Columbia University, and practicing law at the famous Gray’s Inn while studying at LSE(London School of Economics).

The story of the discrimination he faced in his schooling days are well-known to many Indians. As a child in school, he was often made to sit separately from the other students, if not outside the class, on account of his low caste. He was not even allowed to drink water, unless it was poured from a designated student of higher caste into his mouth.

Continue reading “Ambedkar, The Unsung Hero of India”


Fast generalized DFT for finite groups

Almost 6 months ago, I came across an interesting update through the UT CS theory seminar group on the classic problem on the computation of the Discrete Fourier Transform(DFT) of finite groups. Chris Umans, from Caltech, made some pretty nice improvements to the complexity of the general DFT through some slick combinatorics , which I’ll shortly explain. The paper is here.


Well, before I even start, perhaps, I should make some kind of a comment to motivate the problem at hand. A sequence of n complex numbers (x_{i}) can be represented by a function f:\mathbb{Z}_{n} \to \mathbb{C}.

A Discrete Fourier Transform of (x_{i}) sends it to the sequence (F_{i}) where F_{i}=\sum\limits_{j=0}^{n-1}x_{j}e^{\frac{-i2\pi k j}{n}}.

This motivates the definition of the Fourier transform \hat{f} of a function from a finite group f : G \to \mathbb{C} which takes as input a representation \rho:G \to GL(d_{\rho},\mathbb{C}), i.e

\hat{f}(\rho)=\sum\limits_{g \in G} f(g)\rho(g)

where one must note that here, more generally, \rho(g) can be matrix-valued.From Maschke’s theorem, the DFT of a finite group reduces to a map which takes as input an element \alpha \in latex \mathbb{C}[G](group ring consisting of all functions from G to \mathbb{C} under convolution)and sends it to \sum\limits_{g} \alpha(g) \bigoplus_{\rho \in Irr(G)}\rho(g). Throughout, one assumes that we have already have the \rho and chosen a basis for each \rho.

Quite simply, one trivially gets that this requires O(|G|^{2}) operations. Exponent-one algorithms exist for abelian groups(as one would expect using the decomposition into cyclic groups), supersolvable and certain classes of symmetric and alternating groups.

Throughout the post, let \omega be the exponent of matrix multiplication(currently it is around 2.38 I guess).

The main result is that Chris Umans achieves an operation complexity of O(|G|^{\frac{\omega}{2}}. Some basic facts from representation theory will be used throughout though I guess most readers would already be familiar with that. For example, it is easy to prove that through the Schur orthogonality relations is that \sum\limits_{i} |V_{i}|^{2}=|G| where the V_{i} is the set of all irreducible representations of G. This gives us the useful lemma.

Lemma 1:

For every real number \alpha \geq 2, \sum\limits_{\rho \in Irr(G)} dim(\rho)^{\alpha} \leq |G|^{\frac{\alpha}{2}}(Proof omitted)

Some facts about the induced representation and Clifford theory are used but it shouldn’t be too much of a hassle to discuss it on-the-fly.

The Support Trick

Let G be a group and S, a subset. Let’s assume that we can caluclate the generalized DFT with respect to G for all inputs \alpha \in \mathbb{C}[G] supported on S(i.e components \alpha_{g}=0 for g \not\in S) in m operations. Surely, with some extra computational cost(involving the index of S), one can extend this to all inputs because if we can compute DFT supported on S then we can compute DFT supported on Sg' by multiplying  \sum\limits_{g} \alpha(g) \bigoplus_{\rho \in Irr(G)}\rho(g) by \bigoplus_{\rho \in Irr(G)} \rho(g') which has an additional operation cost of \sum\limits_{\rho \in Irr(G)} O(dim(\rho)^{\omega+\epsilon}. We can apply Lemma 1,take appropriate number of ‘shifts/translates’ by elements of G to get the following result(see Section 2.3):

Continue reading “Fast generalized DFT for finite groups”


Monads I: The Eilenberg-Moore category, Adjunction and Kleisli categories

This series(probably three posts) will be on the monad. The first part will deal with only an introduction to monads, algebras over moands and the adjunction problem . The second part will deal with coequalizers and Beck’s theorem in connection with Kleisli categories. In keeping with this blog’s slight CS tilt, the third part will deal with examples of monads from functional programming which I think are crucial to wholly understand the monad. I’ve noticed that I am running too many ‘series posts’ simultaneously. I am still working on the third part of the Grothendieck Spectral Sequence series and will continue the Hopf fibration post which I haven’t updated for a year!


I think that before I continue to introduce the main players of today’s post, I should review the definitions of an adjoint functor as it’ll be quite useful as a tool to see the power of monads.

Consider two categories C,D and functors F:D \to C and G:C \to D. F,G are said to be adjoint i.e F \dashv G if there exists a natural isomorphism of Hom-sets Hom-{C}(Fd,c) \simeq Hom_{D}(d,Gc) for all d \in D,c \in C. Equivalently, by the unit=counit definition, if there exists the pair of natural transformations \eta:1_{D} \Rightarrow GF(unit) and \epsilon:FG \Rightarrow 1_{C}(counit) which satisfy the following commutative diagrams:



Throughout the post, I’ll represent natural transformations by the symbol \Rightarrow. I recommend the confused reader to refer to Aluffi’s Algebra Chapter 0 book in the section on category theory and also my post on the importance of adjoint functors.

If \eta:F \Rightarrow G is a natural transformation of functors F,G: A \to B whose components are given by \eta_{X}:F(X) \to G(X) for an object X in A. If T:B \to C is another functor, then I’ll represent the components of the induced natural transformation HF \Rightarrow HG by (H\eta)_{X}=H(\eta_{X}).

If instead, there is a functor T:Z \to A, then I’ll represent the components of the natural transformation \eta Z:ZF \to ZG by (\eta Z)_{X}=\eta_{Z(X)} where X is an object in Z.


Let’s say that T:C \to C is an endofunctor equppied with two natural transformations \eta:1_{C} \Rightarrow T(unit) and \mu:TT \Rightarrow T such that the following diagrams commute:



The commutative diagram is a kind of generalization of associativity. I direct the reader to the nlab page to learn more about the basics of it. Much of the importance of monads is derived from its connection to adjoint functors and its connection to programming paradigms(see continuation monad for more information).

Let’s see how monads are connected to adjoint functors.

Continue reading “Monads I: The Eilenberg-Moore category, Adjunction and Kleisli categories”


Bar resolutions, Group Cohomology and some cool applications

The format of this post will be a little unorganized. I start off with the definition of group cohomology without giving much motivation. Then, I properly define a bar resolution for groups in complete detail. With this setup, I properly motivate group cohomology, work out some extremely interesting examples to see why one should care about this particular free resolution. I’ve also decided to make another post later on a generalization of the bar resolution to the case of monads and a further generalization to E_{\infty}-algebras. I am currently taking a course on Homotopy Type Theory  and I’ve found it blissfully interesting so there might be a few posts in the future on that too. I’ve only recently noticed that I’ve never actually posted anything on either algebraic topology or homotopy theory, which is actually my main interest so I might update that soon enough.

A representation of a group G over a field K is just a K[G]-module. It is a common philosophy that one can study the structure of a group by studying its representations. In some sense, group cohomology relates to topology but I’ll get to this later on in the post.

Consider \mathbb{Z} as a K[G]-module where K[G] acts trivially and let M be a representation,i.e \mathbb{Z}[G]-module. Extend this to a K[G]-module. The group cohomology H^{n}(G,M) with coefficients in M is defined by

H^{n}(G,M) := Ext_{\mathbb{Z}[G]}^{n}(\mathbb{Z},M) . One can immediately see that H^{0}(G,M) \simeq Hom_{\mathbb{Z}[G]}(\mathbb{Z},M). A map f in Hom_{\mathbb{Z}[G]}(\mathbb{Z},M) satisfies f(g. \lambda)=g.f(\lambda). But since the action on \mathbb{Z} is trivial,f(g. \lambda)=f(\lambda)=\lambda f(1). This implies that \lambda f(1)=\lambda g.f(1). Sending f(1) to any element of M corresponds exactly to the fixed points M^{G}. So, H^{0}(G,M) corresponds to the fixed points M^{G}.

Bar resolution

So, let’s describe these things in terms of the bar resolution. The calculation of the Ext functor entails finding a projective resolution $latex  F_{\bullet} \mapsto X \to 0$ and applying the contravariant Hom(-,B) functor. Taking ‘homology’ yields Ext. The bar resolution actually gives such a resolution, in fact, a free resolution.

Let B_{n} be the free \mathbb{Z}[G]-module on G^{n}, that is, the set of symbols (g_{1} \otimes g_{2} \otimes \cdots \otimes g_{n}) for n \geq 1 and B_{0} := \mathbb{Z}[G] where all those expressions are simply formal symbols for the generated free module. We have the following free resolution, whose maps will be described soon.

\cdots \to B_{2} \to B_{1} \to \mathbb{Z}[G] \to \mathbb{Z} \to 0

[Insert Image]


the augmentation map \epsilon:\mathbb{Z}[G] \to \mathbb{Z} is given by \epsilon(1_{g})=1 on the basis elements.

d_{i}: B_{i} \to B_{i-1} is given by:

d_{n}=\sum\limits_{i=0}^{n} d_{i} where the maps d_{i} is defined on the basis of the free module which extends to the entire group.

d_{0}(g_{0} \otimes g_{1} \cdots \otimes g_{n})=g_{0}(g_{1} \otimes g_{2} \otimes \cdots \otimes g_{n}),

d_{i}(g_{0} \otimes g_{1} \cdots \otimes g_{n})=(g_{0} \otimes \cdots \otimes g_{i}g_{i+1} \otimes g_{i+2} \otimes \cdots \otimes g_{n} for 0 <i <n. Keep note of the dimension above. Finally,

g_{n}(g_{0} \otimes g_{1} \cdots \otimes g_{n})=((g_{0} \otimes g_{1} \cdots \otimes g_{n-1}).

There is topological motivation for this seemingly bizarre construction.Let us say that the aim is to somehow construct a simplicial object from G^{n}. For every ordered (n+1)-tuple, one can insert a n-simplex and G can act on these faces by diagonal action:


If any element of the tuple is 1, the simplex degenerates into lower dimension, for example from the differential maps we saw above, (g_{0} \otimes \cdots g_{i}g_{i+1} \otimes \cdots \otimes g_{n})=(g_{0} \otimes g_{i-1} \otimes g_{i+2} \otimes  \cdots \otimes g_{n}) if g_{i}g_{i+1}=1. The differnetial maps match with the this interpretation of degeneracy and face maps of the simiplicial object.You can alternatively also define a normalized version of the bar resolution where the maps B_{n} are replaced by tuples where the element are non-identity. It is easy to check that the sequence is a chain complex though this involves some annoying calculations. Also, the sequence is exact as we’ll show in the following lemma.

Lemma 1:

The sequence above is a chain complex that is d_{j-1} \circ d_{j}=0 for j \geq 1 and \epsilon \circ d_{0}=0. It is also a \mathbb{Z}[G]-free resolution i.e it is exact.

Proof of Lemma 1:

Step 1:Proving that it is a chain complex

Pick a basis element (g) \in B_{1}.d_{1}((g))=1_{g}-1_{e} \Rightarrow \epsilon((g)-(1))=\epsilon((g))-\epsilon((1))=1-1=0.

I encourage the reader to work out the case for d_{2}. Now, for i>2, we can do a little trick to simplify the calculations. Define a new set of \mathbb{Z}[G]-modules C_{i} by

C_{i}=\mathbb{Z}[G^{i+1}] for i \geq 0 with the \mathbb{Z}[G] action given by a diagonal action on the tuples as follows

1_{g}.(g_{1} \otimes \cdots \otimes g_{i})=(gg_{1} \otimes \cdots \otimes gg_{i}.

I will omit the details but essentially, what happens is that 1_{g}.(1 \otimes g^{-1}g_{1} \otimes \cdots \otimes g^{-1}g_{i})=(g_{1} \otimes \cdots \otimes g_{i}) where we have a bijection of those tuples. So, we just hacked our module C_{i} to be a free \mathbb{Z}[G]-module on one lower dimension. Verify that it is indeed a bijection and you get an isomorphism \phi_{i}:B_{i} \simeq C_{i} for i \geq 0 given by

\phi_{n}(g_{1} \otimes \cdots \otimes g_{n})=(1 \otimes g_{1}g_{2} \otimes \cdots \otimes g_{1}g_{2} \cdots g_{n}).

Isomorphism follows trivially from the fact that left multiplication by g is a bijection on a group. There may be other ways to map the basis elements too.

Define \delta_{i}:C_{i} \to C_{i-1} by

\delta_{n}(g_{1} \otimes \cdots \otimes g_{n})=\sum\limits_{i=0}^{n} (g_{1} \otimes \cdots \hat{g_{i}} \otimes \cdots \otimes g_{n}).

We have that \delta \circ \delta=0 by the usual calculation one encounters in chain complexes. This will obviously be useful soon. We shall prove that j \geq 2,\delta_{j} \circ \phi_{j}=\phi_{j-1} \circ d_{j} from which it follows that \delta_{j-1} \circ \delta_{j} \circ \phi_{j}=0=\delta-{j-1} \circ \phi_{j-1} \circ d_{i}=\phi_{j-2} \circ d_{j-1} \circ d_{i}=0 iff d^{2}=0 since \phi_{i} is an isomorphism.

\delta_{j}\phi_{j}(g_{1} \otimes \cdots \otimes g_{j})=\delta_{j}((1 \otimes g_{1}g_{2} \otimes \cdots \otimes g_{1}g_{2} \cdots g_{n}))

Now apply \delta_{j} to this expression and check that it is equal to \phi_{j-1} \circ d_{j}. It is a slightly annoying calculation but much less painful than dealing with the original equation. Now, it remains to deal with the case for j=2(we’ve already done j=1), which I’ve already left as an exercise to the reader. Anyways, moving onto the next step

Step 2:Complex is chain-homotopic to the identity

We define the chain homotopy \sum_{i}:B_{i} \to B_{i+1} for i\geq -1(taking the -1 case as \mathbb{Z})on the basis elements by

\sum_{i}(1_{g}.(g_{1} \otimes g_{2} \otimes \cdots \otimes g_{n})=(g \otimes g_{1} \otimes \cdots \otimes g_{n}), simply adding the element g to the symbol. We simply use the isomorphism \phi:B_{i} \to C_{i} and transfer the entire problem to the C_{i}‘s to get

\sum_{i}(1_{g}. \phi(g_{1} \otimes \cdots \otimes g_{i})=\sum_{i}(1_{g}.(1 \otimes g_{1} \otimes \cdots \otimes g_{1}g_{2} \cdots g_{i})) \\  =\sum_{i}(g \otimes gg_{1} \otimes \cdots \otimes gg_{1} \otimes \cdots \otimes g_{i}=(1 \otimes g \otimes gg_{1} \otimes \cdots \otimes gg_{1} \otimes \cdots \otimes g_{i}) so we get that

\sum \circ \phi(g_{1} \otimes \cdots \otimes g_{n}=(1 \otimes g_{1} \otimes \cdots \otimes g_{n}). Now I leave it to the reader to verify through simple calculations that

h_{j-1}(\phi_{j}) + \delta_{j+1} \circ h_{j}=id_{C_{j}} for j \geq 1. Lower cases can be dealt with separately.

Group cohomology again and a few applciations

Ok, now that we’ve constructed this \mathbb{Z}[G]-module resolution. By definition, H^{n}(G;M):=Ext_{\mathbb{Z}[G]}^{n}(\mathbb{Z},M) \simeq H^{n}(Hom_{\mathbb{Z}[G]}(B_{n},A))

Now, it’s great that we’ve got a free resolution but obviously the hope is that one gets some meaningful results with the bar resolution. Let’s observe a few key ideas.

Hom_{\mathbb{Z}[G]}(B_{n},A) \simeq Map(G^{n},A)

on the level of sets. Basically, assigning the values at the basis elements gives all \mathbb{Z}[G]-maps from B_{n} to A by extending linearly.


Let’s look at some low-dimensional cases to see what exactly is going on.

I describe H^{0}(G,A) \simeq A^{G} in the first section of the post.

Just like as described in the first section, even Hom_{\mathbb{Z}[G]}(F_{n},A) \simeq C^{n}(G,A), the collection of maps G^{n} \to A since specifying the values on the basis elements determines the entire homomorphism. Under this representation, the boundary maps d_{n}:C^{n}(G,A) \to C^{n+1}(G,A) can be written more explicitly as:


We can call the elements of ker(d_{n}) Z^{n}(G,A) and the elements of Im(d_{n-1}) as B_{n}, the cycles and boundaries respectively or cocycles and coboundaries, if one wishes to be more precise. It is a chain complex and it is exact though I won’t bother much with the details. The upshot is that know we have the following description of the cohomology group

H^{n} \simeq Ext_{\mathbb{Z}[G]}^{n}(\mathbb{Z},A) \simeq \frac{Z_{n}}{B_{n}}.

Great! Now we have just one final simple thing to put in place before we look at some applications.

Theorem 1

Suppose 0 \to A \to B \to C \to 0 is a short exact sequence of \mathbb{Z}[G]-modules, then there is right-derived long exact sequence:

0 \to A^{G} \to B^{G} \to C^{G} \to H^{1}(G,A) \to H^{2}(G,B) \to H^{2}(G,C) \to \cdots

I covered the proof of this theorem for the general case of derived functors in my other post.

I guess, as a simple consequence of the proof of dimension shifting I completed in the other post(or you could prove easily on your own), we have the following theorem which is useful in the context of group cohomology.

Theorem 2(Dimension shifting):


If 0 \to A \to B \to C \to ) is a short exact sequence of \mathbb{Z}[G]-modules such that B^{G} \simeq Hom(G,B) \simeq 0 is trivial, then we have the following natural(wrt to chian complex morphisms) isomorphisms

H^{n+1}(G,A) \simeq H^{n}(G,C) for n \geq 1.

The result essentially allows us to compute homology by computing homology of with respect to some other representation in a lower dimension. Now, let us move onto some examples.

The Extension Problem


This is probably most the classical application of group cohomology which I am sure the reader is already quite familiar with so I shall discuss it only little detail.

A classical problem in group theory was to find all possible extensions of a group by a normal subgroup. If we have two groups H,N, it is natural to ask what groups G are extensions of H by N, that is, fit into the exact sequence

1 \to N \to G \to H \to 1

An obvious example is a the direct product H \times N. One could also add the extra condition that it be a split exact sequence which is equivalent to a semidirect-product N \ltimes H. Two extensions are said to be equivalent if the following diagrams commute:

[Insert Image]

Let’s think of a few extensions of groups. For the groups \mathbb{Z}_{2},\mathbb{Z}_{3}, we have the trivial extension \mathbb{Z}_{2} \times \mathbb{Z} _{3} and the well known extension \mathbb{Z}_{2} \ltimes \mathbb{Z}_{3} where \mathbb{Z}_{2} acts by inversion on \mathbb{Z}_{3}.This is the dihedral group D_{6} of order 6.(Sorry if I have messed up the order of placement of the normal subgroup, I never remember how it works. I think I may have even messed up the semidirect product symbol too. Who cares, I always hated that symbol anyways and abhor the idea of trying to fix it).

We’re going to prove that the elements of the cohomology group H^{2}(G,A) are in one-to-one correspondence to extensions up to equivalence. We begin by first defining a section on the level of sets as follows: If there is an extension of the form

[Insert Image]

We would like to define an action G on A from the exact sequence. Since the sequence is exact, there exists some e_{g} such that $latex  \pi(e_{g})=g$. A section is basically a choice of such e_{g}‘s. We’ll explain the representative notation soon. e_{g} acts on i(a) by conjugation e_{g}i(a)e_{g}^{-1}. Note that \pi(e_{g}i(a))=g.1=g and also that conjugation by this element e_{g}i(a)i(a')i_{a}^{-1}e_{g}^{-1}=e_{g}i(a')e_{g}^{-1} since A is abelian. So, the action is independent of the choice of g.

Let the section be \sigma: G \to E. Each \sigma(g) represents an equivalence class Ag. Both \sigma(g)\sigma(h) and \sigma(gh) are in the same equivalence class Agh for g,h \in G, so by exactness, there is a unique f_{g,h} \in A satisfying


Define a function f:G \times G \to A by f(g,h)=f_{g,h}, the value determined above. Note that every element in E can be written uniquely as i(a).\sigma(g) by exactness. Let’s see how multiplication works in E:

i(a_{1})\sigma(g)i(a_{2})sigma(h)=i(a_{1})\sigma(g)i(a_{2})\sigma(g)^{-1}\sigma(g)\sigma(h). Since \sigma(g)i(a_{2})\sigma(g) \in i(A),\Rightarrow =i(a_{1}) i(g.a_{2}) f(g,h)\sigma(gh). Since A is abelian

\Rightarrow =(i(a_{1}+ g.a_{2}+f(g,h))\sigma(gh) where g.a_{2} is the action of G defined above.

Associativity of multiplication in E(I leave it to the reader to check this) gives the equality:

g.f(h,k)-f(gh,k)+f(g,hk)-f(g,h)=0 which is exactly the condition for f to be in Z(G,A). Finally, let \sigma_{1} be another section of the extension and let f_{1} be its factor set. We’ll show next that f and f_{1} differ by a coboundary, an element of B(G,A0 which implies that every extension has a well-defined cohomology class in H^{2}(G,A).

Obviously, \sigma (g),\sigma_{1}(g) lie in the same coset Ag since they both map to g under \pi. This implies that there is a unique c_{g} such that \sigma(g)=i(c_{g})\sigma_{1}(g). Define c:G \to A by c(g)=c_{g}.

\Rightarrow \sigma(g)\sigma(h)=i(f(g,h)) \sigma(gh)=i(f(g,h)+c(gh))\sigma(gh)


\sigma(g)\sigma(h)=(c(g)\sigma_{1}(g))(c(h)\sigma_{1}(h))=(c_{g}+g.c_{g}+f(g,h))\sigma_{1}(gh) Equating the two expressions yields,

f_{1}(g,h)=f(g,h)+(gc(h)-c(gh)+c(g)) giving the needed result.

Now, we have to prove the opposite assertion , that, an element of H^{2}(G,A) yields a unique(upto equivalence) extension of G by A. I encourage the reader to find the remaining details in a textbook like Dummit and Foote or Aluffi(I am nor sure if he covers it though).

The element 0 in H^{2}(G,A) corresponds to a split extension because then the section \sigma is a genuine homomorphism and so f=0.

Schur-Zassenhaus Theorem

Consider the exact sequence of groups:

[Insert Image]

If gcd(|G|,|A|)=1 then the sequence is split, equivalently,

If the order of A, a normal subgroup of E, is coprime to the order of the index of A in E, then there is a semidirect product such that E \simeq A \ltimes G for some complement G.

We’ll first deal with the abelian case and that’ll turn out to be useful in the general proof. Towards, that we have the following Lemma.

Lemma 2:

Let G be a finite group of order m and let A be an abelian group with G-action, then H^{n}(G,A) are \mathbb{Z}_{m}-modules, that is, they are annihilated on multiplication by m.

Proof of Lemma 2:

Let B_{\bullet} be the bar resolution of G and let \eta be the endomorphism of B_{\bullet} given by multiplication by (m-N) on \mathbb{B_{0}} \simeq \mathbb{Z}[G] and multiplication by m for B_{i} for i \geq 1. We show that \eta is nullhomotopic. Consider the chain homotopy \sum_{n}:B_{n} \to B_{n+1} given by the formula on the basis elements

\sum_{n}(g_{1} \otimes \cdots \otimes g_{n})=\sum_{g \in G}(g_{1} \otimes \cdots \otimes g_{n} \otimes g)

I leave it as an exercise to the reader to show that it is nullhomotopic, a trivial calculation.

Corollary 2.1

Let G be a finite group of order m and let A be either a vector space over \mathbb{Q} or a \mathbb{Z}[\frac{1}{m}]-module then H^{n}(G,A)=0 for n \neq 0.

Proof of Corollary 2.1


Multiplication by m is an isomorphism on A in either case. ApplyingHom_{\mathbb{Z}[G]}(-,A) to the bar resolution B_{\bullet}, consider a class [\phi] \in Ext_{\mathbb{Z}[G]}(B_{\bullet},A).

\phi(mx)=m.\phi(x) which is 0 only if \phi(x) is 0. So, H^{n}(G,A)=0.

This resolves the Schur-Zassenhaus theorem for the abelian case below.

Corollary 2.2(Abelian Case)

If A is an abelian group, then H^{2}(G,A)=0 so every extension of G by A is split.

With this, we reduce the proof of the general case to this specific case through some techniques. Before that though, there’s just a random group theory fact we’re going to use.

Lemma 3:

Let G be a group and let P by a Sylow-p subgroup. Then, N(P) is a subgroup of N(Z(P)).

Proof of Lemma 3:


Proof of Schur Zassenhaus Theorem

We proceed by induction on n, the order of 'A', the first element in the sequence. Let p be a prime that divides $n=ord(A)$ and consider P, a Sylow-p subgroup of A. We know that Z(P) \neq {0} by standard group theory via. the class equation, for example. Consider N_{E}(Z(S)), the normalizer of Z(S) in E. Using Frattinis’s argument,

\lvert E \rvert=\frac{ \lvert N(P) \rvert . \lvert A \rvert}{\lvert N(P) \cap A \rvert}.

So, the index m divides \lvert N(P) \rvert and N(P) is a subgroup of N(Z(P)) by Lemma 3. So, m \vert N(Z(P)). So, we get an extension

0 \to (A \cap N(Z(P)) \to N(Z(P)) \to G \to 0 by the Second Isomorphism Theorem and using Frattini’s argument. If N(Z(P)) \neq E, using the induction hypothesis, this sequence splits so there is a subgroup of N which is isomorphic to G. On the other hand, if N(Z(P)) =E, we get that Z(P) \ltriangle N(Z(P)) and we get the extension

0 \to A/Z(P) \to E/Z(P) \to G \to 1 by the Third Isomorphism theorem. This splits by the induction hypothesis. Let G' be the subgroup of E/Z(P) isomorphic to G. Pulling back, let E_{1}:=\pi^{-1}(G') where \pi is the natural quotient map \pi:E \to E/Z(P). From this, we get the extension,

0 \to Z(P) \to E_{1} \to G' \to 0. Using Corollary 2.2(since Z(P) is abelian), there is  subgroup of E_{1} isomorphic to G' so there is a subgroup of E isomorphic to G' which, as established, is isomorphic to G. Q.E.D





A Road to the Grothendieck Spectral Sequence:Derived Functors II

In the previous post, I gave a little glimpse into derived functors and somewhat motivated their construction. In this post, we’ll get our hands dirty with homological algebra to continue setting up the required machinery and go through many proofs.

In the previous post, I promised to continue the proof of a lemma which establishes a long exact sequence. Before giving the proof, let me mention a few facts which will be useful.

If we’re given a short exact sequence 0 \to I \to A \to B \to 0 in an abelian category where I is an injective object. From the map I \to I and the monomorphism I \hookrightarrow A, we can extend to a map A \to I such that the composition is the identity. Using the Splitting Lemma, one obtains a non-canonical splitting A \simeq I \oplus B. Applying F,

0 \to F(A) \to F(I) \oplus F(B) \to F(B).

The identity map B \to B factors through the projection map \pi:A \to B, the same holds true after applying F, in particular, the last map is surjective!

0 \to F(A) \to F(I) \oplus F(B) \to F(B) \to 0.

Proof of Lemma 1:

Step 1:A morphism between objects with injective resolution induces a chain map between the resolutions

Let \phi:A \to B be a morphism between two objects with injective resolutions A_{\bullet},B_{\bullet}. In the figure below, the map \phi_{0} is constructed from the fact that d_{0}:A \to I_{0} is a monomorphism and there is a map A \to B \to I'_{0} from A to an injective object.


Now, there is a monomorphism I_{0}/ker(d_{1})=I_{0}/Im(d_{0})=Coker(A \to I_{0}) \hookrightarrow I_{1}. Next, note that by the commutativity of the square already defined, \phi_{0} takes Im(d_{0})=Ker(d_{1}) to Ker(d'_{1})=Im(d'_{0}) by the fact that d'{1}d'_{0}=0 by exactness of the lower sequence. This means that the map \phi_{0} induces a morphism h_{0}: Coker(A \to I_{0}) \to Coker(B \to I'_{0}) and by exactness, we can compose this with B/Ker(d'_{1}) to get a map Coker(A \to I_{0}) \to I'_{1}. Since I'_{1}, we get the required map \phi_{1}. Inductively continue this process to get the entire chain map. Note that all the maps defined from the injective object property are not unique.

Step 2:Proving that any two such extensions are chain-homotopic

Let f_{n},g_{n} be two chain maps from A \to I_{\bullet} to B \to I'_{\bullet}.


Continue reading “A Road to the Grothendieck Spectral Sequence:Derived Functors II”


A Road to the Grothendieck Spectral Sequence:Derived Functors I

This is a series of posts which will develop the required material to present the Grothendieck spectral sequence, a few applications of it in algebraic geometry(Lerray sequence, sheaf cohomology etc.) and some other categorical nonsense. This post is meant to be a light glimpse into derived functors and not a full introduction.


I’ve wanted to post this for a few months but unfortunately I overestimated how much free time I would have in my first semester of college(hurrah!). At one point, I totally forgot that I even maintain a blog! That’s why I haven’t posted for about 5 months. The title of my blog is now officially fraudulent. There were many new things(both math-related and otherwise) at UT Austin that I wanted to explore in my first semester so I could be forgiven for putting aside my blog. I think I’ve realized that it is really just a matter of consistency. If I do something regularly, I’ll stick to the routine. So maybe, the title of the blog is more aspirational than it is real.

Anyways, enough of the nonsense. Though the Grothendieck spectral sequeuce encodes a lot of data(like all other spectral sequences),it is actually surprisingly simple to motivate and feels almost ‘natural’ to construct. But I suppose that is really indicative of the Grothendieck’s spectacular reformulation of homological algebra that has seeped into modern textbooks that makes it feel so ‘natural’. It is truly a beautiful sight to witness how derived functors lead to this elegant construction.So first, let’s set up the machinery.

An object I in a category C is said to be an injective object if for every morphism f:X \to I and every monomorphism i:X \to Y, there exists a morphism h:Y \to I extending the map f such that the diagram commutes.

injective object.PNG

In the abelian category setting, the importance lies in the fact that I is an inejctive object if and only if the Hom functor is Hom_{C}(--,I) is exact. If an injective object is at the beginning of a short exact sequence in C, the sequence splits.

A category C has enough injectives if for every every object X \in C, there exists a monomorphism X \hookrightarrow I for some injective object I.

An injective reslution of an object $X \in C$, an abelian category is a resolution

0 \to X \to I_{0} \to I_{1} \to \cdots where I_{k} are injective objects. In particular, this is a quasi isomorphism of chain complexes in C given by X \to I_{\bullet} where X is the complex 0 \to X \to 0 \cdots .

Derived Functors

Let’s say A is an abelian category. Consider a short exact sequence in A:

0 \to L \to M \to N \to 0

An exact functor is a functor between abelian categories which preserves such sequences. Taking the direct sum, for example, preserves preserves a short exact sequence. Accordingly, we say that functors are left and right exact if they preserve the left and right parts of the short exact sequence. It is a well known that in the case of the category of modules over a ring R,the covariant Hom functor is left exact. 0 \to L \to M \to N \to 0, then

0 \to 0 \to Hom(A,L) \to Hom(A,M) \to Hom(A,N) where F_{A}(X)=Hom(A,X):A \to Ab is the Hom functor.

The tensor product functor G(X)=X \otimes_{R} M where M is an R-module is known to be right exact. Many of these facts and their proofs can be found in many standard texts on commutative algebra or homological algebra. Some of the arguments in these proofs tend to be quite arduous to work through. An easier way to prove it is to notice that the functors are adjoint and show the equivalent statement that Hom preserves limits which is not too difficult. Taking the dual, yields the statement for tensoring and infact, it yields the completely general version which states that left adjoint functors preserve finite colimits using the Yoneda Lemma. See my other post on adjoint functors if you wish to learn more.

The point of derived functors(which I’ll shortly introduce) is to take these ‘incomplete’ exact sequences where we’ve ‘lost data’ to try and construct a long exact sequence. Remember that chain complex

\cdots \xrightarrow{} C{n+1} \xrightarrow{\partial} C_{n} \xrightarrow{\partial}

equipped with ‘boundary maps(which I’ve not labelled) allows us to compute homology H_{n}=\frac{ker\partial}{Im\partial} which measures how far the sequence is from being exact. ALL the data is in the chain complex itself and the entire process of computing homology/cohomology is just a formalization which turns out to be quite handy. In the same manner, one should treat derived functors as a comfortable formalization using the data we possess. For an object X \in A, we know just one thing:that there is an injective resolution:

0 \to X \to I_{0} \to I_{1} \to \cdots .

Now, we take our left exact functor F(contravariant Hom for example) and apply it to the injective resolution to get F(X \to I_{\bullet})

0 \to F(X) \to F(I_{0}) \to F(I_{1}) \to \cdots

Now, just ‘take homology/cohomology’ and call it the right derived functor

R^{i}F(X)=H^{i}(F(X \to I_{\bullet})). But wait, did you notice something?On the left hand side, I don’t refer to the injective resolution. That is the essence of the construction, it is independent of the injective resolution of X upto canonical isomorphism. A proof of this can be found in any standard textbook on algebra in the homological algebra section(Aluffi, Dummit and Foote;I think Hatcher also proves it for the dual case in the section on cohomology). Let’s take a closer look at this sequence. Since F is left exact, we get the following exact sequence:

0 \to F(A) \to F(I_{0}) \to F(I_{1})

W get R^{0}(F(X))=F(X). If we F is exact, then the all R^{i}(F(X))=0 would be trivial for i>0! I guess you could think of this as a way to encode approximation just like in homology/cohomology.

The converse isn’t necessarily true. An object X \in A is said to be $\bf{F-acyclic}$ for a left exact functor if R^{i}(F(X))=0 for i > 0.

The final step of this formalization is ensuring that we have the long exact sequence

Lemma 1:

If 0 \to L \to M \to N \to 0 is a short exact sequence in an abelian category A with enough injectives and F is a left exact functor, there is an LES

0 \to L \to M \to N \to \\  \to R^{1}(F(L)) \to R^{1}(F(M)) \to R^{1}(F(N)) \to \cdots

The proof will be deferred to the next post with discussions on its dual, other theorems and special cases such as Ext, Tor.





What exactly are adjoint functors?

Below, I’ll repost an answer I gave to a question on Quora about the importance of adjoint functors. It is a simple exposition which focuses on the intuition behind their construction and gives a few examples with explanations that can aid anyone who is beginning to learn about adjoint functors. I’ll probably add more detail on the BC and throw in a few more diagrams later.

As another answer has already pointed out, adjoint functors are ubiquitous in many constructions across mathematics and that is enough reason to consider them important. But to help appreciate it more, I think a slightly different interpretation of a natural transformation will be helpful.

Let C be a small category and consider the nerve NC of the category which is a simplicial set whose n-simplices are the strings:

X_{0} \mapsto X_{1} \mapsto \cdots \mapsto X_{n} where X_{i} are obviously the objects and the arrows are the morphisms in the category. The kth-face maps can be obtained by simply deleting X_{k} . The classifying space BC is then realized as the geometrization of NC (turn the entire construction into a formal CW complex). For example, in the simple case of a poset where morphisms are determined by order, BC has the points of J as its vertices and the size-k totally ordered subsets as it’s k-simplices.

So, a functor F:C \to D induces a CW-complex map F*:BC \to BD from the category of small categories to the category of CW complexes. I encourage the reader to work through this construction for the sake of clarity.


I’ll leave out a few details here but I’ll say that it is in this sense that a natural transformation C<->D is essentially a homotopy BC \times I<-> BD . You can then think of adjoint functors as homotopy equivalences on categories. Hence, they do correspond to something weaker than a homeomorphism which is often why many call adjoint functors ‘conceptual inverses’. The concept of an adjoint functor is a special case of an adjunction in 2-categories where these ideas make more sense. The Wikipedia page motivates it by considering a functor F:C \to D and finding the problem for which F is the most efficient solution. This is why these adjoint functors always come in pairs. Let’s review the definition and look at a few interesting examples for all this to make more sense.

Two functors F:D \to C , G:C \to D are said to be adjoint i.e F \dashv G if there exists a natural isomorphism of the hom-sets Hom_{C}(FY,X) \sim Hom_{D}(Y,FX) . The equivalent counit-unit definition which is often easier to work with is that are natural isomorphisms \epsilon:FG \Rightarrow 1 and 1 \Rightarrow GF such that the following triangles commute:

The other answer has presented one of the simplest and most important examples, the free-forgetful functor on groups. I’ll present an example of a ‘free-forgetful’-type adjunction which may be a little harder to notice.

  1. Let C be a category, say the category of groups or something else. Take two objects a,b \in C . We will study the product a \times b . The product a \times b is defined by the maps to a and b so that if c^{'} maps to both a and b , then there is a unique morphism f:c^{'} \to a \times b which makes the diagram commute. Let’s construct a functor F: C \times C \to C defined by (a,b) \to a \times b . Damn, did you notice that? We just lost information as there is no map from C \to C \times C that can you can compose to get the identity. a \times b can probably be constructed in different ways as a product. Well, it hardly matters, let’s try to do something inverse-like. Take an object c \in C and do the only thing you can think of, that is, send it to (c,c) . Let this be the diagonal map \Delta :C \to C \times C. Notice that a pair of morphisms c \to a, c \to b in C is the same thing as a morphism (c,c) \to (a,b) in C \times C . So we have the hom-set natural isomorphism Hom_{C \times C}(\Delta(c),(a,b)) \simeq Hom_{C}(c,a \times b) . The counit is the pair of projection maps(a single morphism in C \times C ) for a \times b and the unit takes c to c \times c (verify these facts and the components of the natural transformations). A nice mnemonic I once heard was that left adjoints(here \Delta ) are ‘liberal’ and generate free things.
  2. Let’s look at a simple example in topology. Let HausC be the category of compact Hausdorff spaces. Define F:HausC \to Top which essentially returns the topological space and forgets the separation axiom and compactness. The ‘most efficient solution’ to this optimization is the functor \beta:Top \to HausC , the Stone-Cech compactification functor. \beta X satisfies the universal property that any map X \to Y where Y is compact and Hausdorff factors through \beta X .
  3. Another extremely important adjunction is the Hom-Tensor adjunction. Let M,N be rings and consider the categories MMod,NMod of modules over these rings. Let X be a (M,N) bi-module and define F(D)=D \otimes_{M} X and G(C)=Hom_{N}(X,C) . We have a natural isomorphism Hom_{N}(D \otimes_{M} X,C) \simeq Hom_{R}(D,Hom_{M}(N,X)) which can be verified by constructing the units,counits,a simple exercise. It is not free-forgetful in any obvious sense like the previous examples.
  4. Again, a basic example arises from considering partially ordered sets. If C and D are two posets then they are naturally categories. A pair of adjoint functors F \dashv G in this case is a Galois connection which means that F(c) \leq d if and only if c \leq F(d) , in other words, a natural isomorphism Hom_{D}(F(c),d) \simeq Hom_{C}(c,F(d)) . Again, this doesn’t look like a free-forgetful pair.

The best way to understand adjoint functors is to encounter more exmaples of adjunction. As a guiding principle, you can expect to find adjunctions in ‘symmetric and inverse-like’ constructions. The left and right adjoints have many interesting properties, like preserving colimits and limits respectively. This can be proven quite easily once you establish that the Hom-functor preserves arbitrary limits. In fact, there is a class of theorems that allow one to establish left/right adjoints for a functor based on how it acts on limits and colimits.


Proof of the Sensitivity Conjecture without Cauchy’s Interlacing Theorem

If you aren’t aware, one month ago, the Sensitivity Conjecture, a 30-year old problem, was proven by Hao Hung, an assistant professor at Emory University in just a little more than 2 pages using particularly simple methods albeit cleverly implemented. The proof is very easy to understand and requires nothing more than basic linear algebra. You can find it here.

A few days ago, Donald Knuth simplified the already simple proof of H. Hung and could fit all his arguments in half a page. What really shocked me when I heard the news is that Knuth is still alive. I could have sworn that it was only a few days ago when I was skimming through a textbook on Discrete Mathematics and saw a bio with a black-and-white pic of him in one of the chapters related to computational complexity in the same manner that I’d see pictures and mini-biographies of other 20th century giants such as Grothendieck and Nash relegated to the margins of math textbooks and online articles romantically detailing the course of their fabled lives.

Now that I’ve realized that he’s well and alive, it shocks me equally to learn that at the age of 81, he is still able to contribute to research.

I’m not exactly going to regurgitate Knuth’s proof but what I’ll present certainly uses the same ideas. I must note here than Knuth’s proof itself is inspired by a comment on Scott Aronson’s blog which provided a method to prove the conjecture without using Cauchy’s Interlacing Theorem.

If you don’t know what the Sensitivity Conjecture, I’ll state it below.

If f:\{0,1\}^{n} \mapsto \{0,1 \} is a boolean function where the domain is the graph of the n-dimensional cube denoted by Q^{n}=\{0,1 \}^{n}. So, a particular input x is just a string of 0s and 1s. The local sensitivity of f at an input x is the number of indices in the ‘string’ of x that you can flip and not change the output. The local block sensitivity of f at input x is the number of disjoint subsets of the set of indices \{0,1,2 \cdots, n \} such that the output doesn’t change when every index corresponding to a block flips in the input string of x. The sensitivity and block sensitivity are defined to be the maximum of these corresponding measures over all the inputs.

Continue reading “Proof of the Sensitivity Conjecture without Cauchy’s Interlacing Theorem”


Topological Constraints on the Universal Approximation of Neural Networks

The title may seem like a contradiction given that there is such a thing as the Universal Approximation Theorem which simply states that a neural network with a single hidden layer of finite width(i.e finite number of neurons) can approximate any function on a compact set of \mathbb{R}^{n} given that the activation function is non-constant,bounded and continuous.

Needless to say, I haven’t found any kind of flaws in the existing proofs(see Kolmogrov or Cybenko). However, I thought of something a little simple and scoured the internet for an answer.

What if we allow an arbitrary number of hidden layers and bound the dimension of the hidden layers making them ‘Deep, Skinny Neural Networks’? Would that be a universal approximator?

Continue reading “Topological Constraints on the Universal Approximation of Neural Networks”


Slick proof of Hilbert’s Nullstellensatz

It is a well known fact that there are multiple proofs for the Nullstellensatz which do not use Noether’s normalization lemma. Serge Lang’s proof(in his book) and Zariski’s proof both fall under this category. In fact, Daniel Allcock of UT Austin posted a proof which essentially builds from the edifice of Zariski’s proof(see that here). He claims that it is the simplest proof for the Nullstellensatz and frankly this is quite true considering the proof uses nothing more than basic field theory, is only one page long and just requires some familiarity with transcendence bases, and the transcendence degree. Yet in the true spirit of simplicity, T. Tao has presented a proof which uses nothing more than the extended Euclidean algorithm and “high school algebra” albeit with a lengthy case-by-case analysis(see the proof here).

Most of these proofs(except Tao’s) go about proving the ‘Weak’ Nullstellensatz and obtain the ‘Strong’ Nullstelensatz through the famous Rabinowitsch trick.

But a few days, I found something truly magnificent, a proof by Enrique Arrondo in the American Mathematical Monthly which proves the Nullstellensatz using a weaker version of Noether normalization and techniques similar to that of Tao, Ritrabata Munshi. The proof is essentially a simplification of a proof by R. Munshi.

Here, I present a special case of the normalization lemma.



If F is an infinite field and f \in F[x_{1},x_{2},\cdots ,x_{n}] is a non-constant polynomial and n \geq 2 whose total degree is d, then there exists a_{1},a_{2},\cdots ,a_{n-1} \in F such that the coefficient of x_{n}^{d} in

f(x_{1}+a_{1}x_{n},x_{2}+a_{2}x_{n},\cdots ,x_{n-1}+a_{n-1}x_{n},x_{n})

is non-zero.


Let f_{d} represent the homogenous component of f of degree d.So, the coefficient of x_{d}^{n} in f(x_{1}+a_{1}x_{n},x_{2}+a_{2}x_{n},\cdots,x_{n-1}+a_{n-1}x_{n},x_{n}) is f_{d}(a_{1},\cdots,a_{n-1},1). As a polynomial in F[x_{1},\cdots,x_{n-1}], there is some point (a_{1},\cdots,a_{n-1} \in F^{n-1} where f_{d}(a_{1},\cdots,a_{n-1},1) doesn’t vanish. Choose such a point to establish the a_{i} and this guarantees a non-zero coefficient of x_{n}^{d}.

Continue reading “Slick proof of Hilbert’s Nullstellensatz”


Coxeter Groups(intro)

Reflection groups

In order to understand the intuition underlying the theory of Coxeter groups(and Weyl groups in particular groups in particular), you can go through the theory of reflection groups which I’ll only superficially treat preceding my exposition of Coxetr systems and the associated representation theory.

Consider some Euclidean space V and a vector \alpha \in V. A reflection associated with \alpha is a linear operator which sends \alpha to -\alpha and point-wise fixes the orthogonal hyperplane/subspace H_{\alpha}.

If the reflection is s_{\alpha}, then it can represented as:

s_{\alpha}(x)=x-\frac{2(x,\alpha)}{(\alpha,\alpha)}\alpha=x-2 proj_{\alpha}(x)

Clearly s_{\alpha} is an orthogonal transformation and the set of all reflections in V can be recognized as the subgroup of O(V)(orthogonal group of V) consisting of elements of order 2.

Continue reading “Coxeter Groups(intro)”


Hopf Fibrations-Construction and Quaternion Interpretations (I)

Before I begin discussing the Hopf fibration of the 3-spbhere, one of the simplest yet deeply profound example of a non-trivial fiber bundle, I’d like to recall the definition of a fiber bundle.

Let E,B,F represent the entire space, base space and the fiber respectively where E,B are connected. If f:E \mapsto B is a continuous surjection onto the base space, then the structure (E,B,F,f) is said to be a fiber bundle if for every x\in E, there exists a neighborhood U \subset B of f(x) such that there exists a homeomorphism \psi:f^{-1}(U) \mapsto U \times F such that \psi \circ \pi_{U}=f.

What this basically means is that locally, the fiber bundle looks like the product B \times F but globally, it may have different topological properties.

A trivial fiber bundle is a fiber bundle which in which the total space is E=B \times F. In fact, any fiber bundle over a contractible CW Complex is trivial.

Continue reading “Hopf Fibrations-Construction and Quaternion Interpretations (I)”


CW Complex Structure of Real Projective Space and Kusner’s Parametrization of Boy’s surface

The real projective space represented by \mathbb{RP}^{n} is the space obtained from \mathbb{R}^{n+1} under the equivalence relation x \sim kx \forall x \in \mathbb{R}^{n+1}. Basically, \mathbb{RP}^{n} is the set of lines which passed through the origin in \mathbb{R}^{n+1}. It can also be understood by identifying antipodal points(points which lie on opposite ends of the diameter) of a unit n-sphere,S^{n}.


One very basic yet deeply interesting example of these spaces is \mathbb{RP}^{2}, known as the real projective plane. While \mathbb{RP}^{0} is a points and \mathbb{RP}^{1} is homeomorphic to a circle with infinity, the real projective plane turns out to be far more interesting indeed. It can’t be embedded in \mathbb{R}^{3} and its immersion/s such as Boy’s surface, Roman surface and Cross Cap have far stranger structures than a mere circle as in the case of \mathbb{RP}^{2}. In fact, I will delineate some of the calculations involved in the Kusner-Bryant 3-dimensional parametrization of Boy’s surface. It’s a little fascinating how much complexity can be added to mathematical structures upon generalization especially in the case of the projective space which I believe have a remarkably simple and ‘non-threatening’ definition.

Continue reading “CW Complex Structure of Real Projective Space and Kusner’s Parametrization of Boy’s surface”


Moore-Smith Sequences/Nets in General Topology

Nets/Moore-Smith Sequences

Sequences are common objects in the field of topology. Often, sequences can help identify continuous functions, limit points and compact spaces in metric spaces.

Moore-Smith sequences(or nets) are essentially a generalization of the sequence for an arbitrary topological space and we can see that many foundational theorems of general topology can be stated in terms of nets.

So first, let’s recall what a partial order and direct set is:

A partial order is an order realtion \leq satisfying:-

  • \alpha \leq \alpha
  • If \alpha \leq \beta and \beta \leq \alpha, then \alpha=\beta.
  • If \alpha \leq \beta,\beta \leq \gamma, then \alpha \leq \gamma.


A directed set I is a set with a partial order relation \leq such that for any \alpha,\beta \in I,there exists \gamma such that \alpha \leq \gamma,\beta \leq \gamma.

It’s best to think of a directed set as a sort of analogue to an indexing set.

Continue reading “Moore-Smith Sequences/Nets in General Topology”


Connection between a symmetric linear transformation and the unit sphere


There is an interesting correspondence among the quadratic form of a symmetric linear transformation T:V \mapsto Von a real Euclidean space,the extreme values of the sphere and the eigenvectors of T

Let Q(x)=(T(x),x) be the quadratic form associated with a symmetric linar transformation which maps V into itself,then the set of elements u in V satisfying \langle u,u  \rangle=1 is called the unit sphere of V

Continue reading “Connection between a symmetric linear transformation and the unit sphere”

Innocent Problem?

I thought of a random topological problem a week ago in my Analysis class and it has been bugging me for quite a while. I tried searching around but couldn’t find anything.

Consider a non-intersecting curve \gamma: I=[0,1] \to \mathbb{R}^{3}. It can even be a closed loop but it shouldn’t be ‘too straight’ else the problem is trivial. Considering a ‘thickening of the curve’. If X(0) is the curve at time t=0 and X(t) be obtained by adjoining closed disks of radius t at every point of the curve \gamma(I won’t even bother to formalize it). Now, consider the relative complement homology(aka local at subspace) \tilde{H}_{k}(\mathbb{R}^{3},\mathbb{R}^{3} \backslash X(t)). The vague problem is to find nice examples of curves and how their homology groups vary with the parameter t. For example, if one considers an 8-loop except it doesn’t exactly intersect(instead, there is a small gap), then at t=0, we obviously have the homology of a circle. If we thicken it a little bit, we’ll get a 4-torus which gains information in H_{2}(\mathbb{Z}^{4}) and thicken it even more, the entire thing collapses to 0. The same thing works with a circle(you get a torus on thickening). For a trivial example, if you just take a line segment then any thickening deformation retracts to the segment which contracts to a point.


I asked my Analysis professor about it, unfortunately, he didn’t have much of an idea and directed me to faculty who may know something about it(and who I’ve yet to contact). Then, I approached my linear algebra professor after class to ask if he knew anything about it(probably a bad idea since he works in representation theory), he just laughed and walked out. I have some stuff jotted down on the problem making little(obvious) progress but I wanted to gather more examples of classes of curves and some inkling of a theorem before I post anything.

I am currently working on the third part of the Derived Functors sequence, some interesting things in stable homotopy theory.