A Road to the Grothendieck Spectral Sequence:Derived Functors I

This is a series of posts which will develop the required material to present the Grothendieck spectral sequence, a few applications of it in algebraic geometry(Lerray sequence, sheaf cohomology etc.) and some other categorical nonsense. This post is meant to be a light glimpse into derived functors and not a full introduction.


 

I’ve wanted to post this for a few months but unfortunately I overestimated how much free time I would have in my first semester of college(hurrah!). At one point, I totally forgot that I even maintain a blog! That’s why I haven’t posted for about 5 months. The title of my blog is now officially fraudulent. There were many new things(both math-related and otherwise) at UT Austin that I wanted to explore in my first semester so I could be forgiven for putting aside my blog. I think I’ve realized that it is really just a matter of consistency. If I do something regularly, I’ll stick to the routine. So maybe, the title of the blog is more aspirational than it is real.


Anyways, enough of the nonsense. Though the Grothendieck spectral sequeuce encodes a lot of data(like all other spectral sequences),it is actually surprisingly simple to motivate and feels almost ‘natural’ to construct. But I suppose that is really indicative of the Grothendieck’s spectacular reformulation of homological algebra that has seeped into modern textbooks that makes it feel so ‘natural’. It is truly a beautiful sight to witness how derived functors lead to this elegant construction.So first, let’s set up the machinery.

An object I in a category C is said to be an injective object if for every morphism f:X \to I and every monomorphism i:X \to Y, there exists a morphism h:Y \to I extending the map f such that the diagram commutes.

injective object.PNG

In the abelian category setting, the importance lies in the fact that I is an inejctive object if and only if the Hom functor is Hom_{C}(--,I) is exact. If an injective object is at the beginning of a short exact sequence in C, the sequence splits.

A category C has enough injectives if for every every object X \in C, there exists a monomorphism X \hookrightarrow I for some injective object I.

An injective reslution of an object $X \in C$, an abelian category is a resolution

0 \to X \to I_{0} \to I_{1} \to \cdots where I_{k} are injective objects. In particular, this is a quasi isomorphism of chain complexes in C given by X \to I_{\bullet} where X is the complex 0 \to X \to 0 \cdots .

Derived Functors

Let’s say A is an abelian category. Consider a short exact sequence in A:

0 \to L \to M \to N \to 0

An exact functor is a functor between abelian categories which preserves such sequences. Taking the direct sum, for example, preserves preserves a short exact sequence. Accordingly, we say that functors are left and right exact if they preserve the left and right parts of the short exact sequence. It is a well known that in the case of the category of modules over a ring R,the covariant Hom functor is left exact. 0 \to L \to M \to N \to 0, then

0 \to 0 \to Hom(A,L) \to Hom(A,M) \to Hom(A,N) where F_{A}(X)=Hom(A,X):A \to Ab is the Hom functor.

The tensor product functor G(X)=X \otimes_{R} M where M is an R-module is known to be right exact. Many of these facts and their proofs can be found in many standard texts on commutative algebra or homological algebra. Some of the arguments in these proofs tend to be quite arduous to work through. An easier way to prove it is to notice that the functors are adjoint and show the equivalent statement that Hom preserves limits which is not too difficult. Taking the dual, yields the statement for tensoring and infact, it yields the completely general version which states that left adjoint functors preserve finite colimits using the Yoneda Lemma. See my other post on adjoint functors if you wish to learn more.

The point of derived functors(which I’ll shortly introduce) is to take these ‘incomplete’ exact sequences where we’ve ‘lost data’ to try and construct a long exact sequence. Remember that chain complex

\cdots \xrightarrow{} C{n+1} \xrightarrow{\partial} C_{n} \xrightarrow{\partial}

equipped with ‘boundary maps(which I’ve not labelled) allows us to compute homology H_{n}=\frac{ker\partial}{Im\partial} which measures how far the sequence is from being exact. ALL the data is in the chain complex itself and the entire process of computing homology/cohomology is just a formalization which turns out to be quite handy. In the same manner, one should treat derived functors as a comfortable formalization using the data we possess. For an object X \in A, we know just one thing:that there is an injective resolution:

0 \to X \to I_{0} \to I_{1} \to \cdots .

Now, we take our left exact functor F(contravariant Hom for example) and apply it to the injective resolution to get F(X \to I_{\bullet})

0 \to F(X) \to F(I_{0}) \to F(I_{1}) \to \cdots

Now, just ‘take homology/cohomology’ and call it the right derived functor

R^{i}F(X)=H^{i}(F(X \to I_{\bullet})). But wait, did you notice something?On the left hand side, I don’t refer to the injective resolution. That is the essence of the construction, it is independent of the injective resolution of X upto canonical isomorphism. A proof of this can be found in any standard textbook on algebra in the homological algebra section(Aluffi, Dummit and Foote;I think Hatcher also proves it for the dual case in the section on cohomology). Let’s take a closer look at this sequence. Since F is left exact, we get the following exact sequence:

0 \to F(A) \to F(I_{0}) \to F(I_{1})

W get R^{0}(F(X))=F(X). If we F is exact, then the all R^{i}(F(X))=0 would be trivial for i>0! I guess you could think of this as a way to encode approximation just like in homology/cohomology.

The converse isn’t necessarily true. An object X \in A is said to be $\bf{F-acyclic}$ for a left exact functor if R^{i}(F(X))=0 for i > 0.

The final step of this formalization is ensuring that we have the long exact sequence

Lemma 1:

If 0 \to L \to M \to N \to 0 is a short exact sequence in an abelian category A with enough injectives and F is a left exact functor, there is an LES

0 \to L \to M \to N \to \\  \to R^{1}(F(L)) \to R^{1}(F(M)) \to R^{1}(F(N)) \to \cdots

The proof will be deferred to the next post with discussions on its dual, other theorems and special cases such as Ext, Tor.

 

 

 

What exactly are adjoint functors?

Below, I’ll repost an answer I gave to a question on Quora about the importance of adjoint functors. It is a simple exposition which focuses on the intuition behind their construction and gives a few examples with explanations that can aid anyone who is beginning to learn about adjoint functors. I’ll probably add more detail on the BC and throw in a few more diagrams later.


As another answer has already pointed out, adjoint functors are ubiquitous in many constructions across mathematics and that is enough reason to consider them important. But to help appreciate it more, I think a slightly different interpretation of a natural transformation will be helpful.

Let C be a small category and consider the nerve NC of the category which is a simplicial set whose n-simplices are the strings:

X_{0} \mapsto X_{1} \mapsto \cdots \mapsto X_{n} where X_{i} are obviously the objects and the arrows are the morphisms in the category. The kth-face maps can be obtained by simply deleting X_{k} . The classifying space BC is then realized as the geometrization of NC (turn the entire construction into a formal CW complex). For example, in the simple case of a poset where morphisms are determined by order, BC has the points of J as its vertices and the size-k totally ordered subsets as it’s k-simplices.

So, a functor F:C \to D induces a CW-complex map F*:BC \to BD from the category of small categories to the category of CW complexes. I encourage the reader to work through this construction for the sake of clarity.

 

I’ll leave out a few details here but I’ll say that it is in this sense that a natural transformation C<->D is essentially a homotopy BC \times I<-> BD . You can then think of adjoint functors as homotopy equivalences on categories. Hence, they do correspond to something weaker than a homeomorphism which is often why many call adjoint functors ‘conceptual inverses’. The concept of an adjoint functor is a special case of an adjunction in 2-categories where these ideas make more sense. The Wikipedia page motivates it by considering a functor F:C \to D and finding the problem for which F is the most efficient solution. This is why these adjoint functors always come in pairs. Let’s review the definition and look at a few interesting examples for all this to make more sense.


Two functors F:D \to C , G:C \to D are said to be adjoint i.e F \dashv G if there exists a natural isomorphism of the hom-sets Hom_{C}(FY,X) \sim Hom_{D}(Y,FX) . The equivalent counit-unit definition which is often easier to work with is that are natural isomorphisms \epsilon:FG \Rightarrow 1 and 1 \Rightarrow GF such that the following triangles commute:

The other answer has presented one of the simplest and most important examples, the free-forgetful functor on groups. I’ll present an example of a ‘free-forgetful’-type adjunction which may be a little harder to notice.

  1. Let C be a category, say the category of groups or something else. Take two objects a,b \in C . We will study the product a \times b . The product a \times b is defined by the maps to a and b so that if c^{'} maps to both a and b , then there is a unique morphism f:c^{'} \to a \times b which makes the diagram commute. Let’s construct a functor F: C \times C \to C defined by (a,b) \to a \times b . Damn, did you notice that? We just lost information as there is no map from C \to C \times C that can you can compose to get the identity. a \times b can probably be constructed in different ways as a product. Well, it hardly matters, let’s try to do something inverse-like. Take an object c \in C and do the only thing you can think of, that is, send it to (c,c) . Let this be the diagonal map \Delta :C \to C \times C. Notice that a pair of morphisms c \to a, c \to b in C is the same thing as a morphism (c,c) \to (a,b) in C \times C . So we have the hom-set natural isomorphism Hom_{C \times C}(\Delta(c),(a,b)) \simeq Hom_{C}(c,a \times b) . The counit is the pair of projection maps(a single morphism in C \times C ) for a \times b and the unit takes c to c \times c (verify these facts and the components of the natural transformations). A nice mnemonic I once heard was that left adjoints(here \Delta ) are ‘liberal’ and generate free things.
  2. Let’s look at a simple example in topology. Let HausC be the category of compact Hausdorff spaces. Define F:HausC \to Top which essentially returns the topological space and forgets the separation axiom and compactness. The ‘most efficient solution’ to this optimization is the functor \beta:Top \to HausC , the Stone-Cech compactification functor. \beta X satisfies the universal property that any map X \to Y where Y is compact and Hausdorff factors through \beta X .
  3. Another extremely important adjunction is the Hom-Tensor adjunction. Let M,N be rings and consider the categories MMod,NMod of modules over these rings. Let X be a (M,N) bi-module and define F(D)=D \otimes_{M} X and G(C)=Hom_{N}(X,C) . We have a natural isomorphism Hom_{N}(D \otimes_{M} X,C) \simeq Hom_{R}(D,Hom_{M}(N,X)) which can be verified by constructing the units,counits,a simple exercise. It is not free-forgetful in any obvious sense like the previous examples.
  4. Again, a basic example arises from considering partially ordered sets. If C and D are two posets then they are naturally categories. A pair of adjoint functors F \dashv G in this case is a Galois connection which means that F(c) \leq d if and only if c \leq F(d) , in other words, a natural isomorphism Hom_{D}(F(c),d) \simeq Hom_{C}(c,F(d)) . Again, this doesn’t look like a free-forgetful pair.

The best way to understand adjoint functors is to encounter more exmaples of adjunction. As a guiding principle, you can expect to find adjunctions in ‘symmetric and inverse-like’ constructions. The left and right adjoints have many interesting properties, like preserving colimits and limits respectively. This can be proven quite easily once you establish that the Hom-functor preserves arbitrary limits. In fact, there is a class of theorems that allow one to establish left/right adjoints for a functor based on how it acts on limits and colimits.

Proof of the Sensitivity Conjecture without Cauchy’s Interlacing Theorem

If you aren’t aware, one month ago, the Sensitivity Conjecture, a 30-year old problem, was proven by Hao Hung, an assistant professor at Emory University in just a little more than 2 pages using particularly simple methods albeit cleverly implemented. The proof is very easy to understand and requires nothing more than basic linear algebra. You can find it here.

A few days ago, Donald Knuth simplified the already simple proof of H. Hung and could fit all his arguments in half a page. What really shocked me when I heard the news is that Knuth is still alive. I could have sworn that it was only a few days ago when I was skimming through a textbook on Discrete Mathematics and saw a bio with a black-and-white pic of him in one of the chapters related to computational complexity in the same manner that I’d see pictures and mini-biographies of other 20th century giants such as Grothendieck and Nash relegated to the margins of math textbooks and online articles romantically detailing the course of their fabled lives.

Now that I’ve realized that he’s well and alive, it shocks me equally to learn that at the age of 81, he is still able to contribute to research.

I’m not exactly going to regurgitate Knuth’s proof but what I’ll present certainly uses the same ideas. I must note here than Knuth’s proof itself is inspired by a comment on Scott Aronson’s blog which provided a method to prove the conjecture without using Cauchy’s Interlacing Theorem.

If you don’t know what the Sensitivity Conjecture, I’ll state it below.

If f:\{0,1\}^{n} \mapsto \{0,1 \} is a boolean function where the domain is the graph of the n-dimensional cube denoted by Q^{n}=\{0,1 \}^{n}. So, a particular input x is just a string of 0s and 1s. The local sensitivity of f at an input x is the number of indices in the ‘string’ of x that you can flip and not change the output. The local block sensitivity of f at input x is the number of disjoint subsets of the set of indices \{0,1,2 \cdots, n \} such that the output doesn’t change when every index corresponding to a block flips in the input string of x. The sensitivity and block sensitivity are defined to be the maximum of these corresponding measures over all the inputs.

Continue reading “Proof of the Sensitivity Conjecture without Cauchy’s Interlacing Theorem”

Topological Constraints on the Universal Approximation of Neural Networks

The title may seem like a contradiction given that there is such a thing as the Universal Approximation Theorem which simply states that a neural network with a single hidden layer of finite width(i.e finite number of neurons) can approximate any function on a compact set of \mathbb{R}^{n} given that the activation function is non-constant,bounded and continuous.

Needless to say, I haven’t found any kind of flaws in the existing proofs(see Kolmogrov or Cybenko). However, I thought of something a little simple and scoured the internet for an answer.

What if we allow an arbitrary number of hidden layers and bound the dimension of the hidden layers making them ‘Deep, Skinny Neural Networks’? Would that be a universal approximator?

Continue reading “Topological Constraints on the Universal Approximation of Neural Networks”

Schur’s Lemma

This is a standard result that any beginner in the study of the representation theory would be aware of. I merely present restate the theorem a little(essentially I don’t) and prove it.

Theorem:

Let V,W be representations of a group G. If f:V \mapsto W is a G-linear map, then the following are true assuming that V and W are not isomorphic:

1)If both V,W are irreducible, then f is either an isomorphism or the zero map.

2)If only V is irreducible, then f is either injective or the zero map.

3)If only W is irreducible, then f is either surjective or the zero map


If V,W are isomorphic as representations, then f is a scalar multiplication map.

Proof:

Let \rho_{V},\rho_{W} be the respective representations. Here, V,W are vector spaces over an algebraically closed field.

We tackle the first part of the theorem.

Consider the kernel of f, say Ker(f), which is set of all v \in V such that f(v)=0. Since f is a G-linear map, f(g.v)=g.f(v)=0 implies that if v \in Ker(f), then g.v \in Ker(f). Hence, Ker(f) is stable under the action of G which makes it a sub-representation. Since V is irreducible, this means that either Ker(f) must be zero or f must be the zero map. This proves that f is injective.

Continue reading “Schur’s Lemma”

Irrationality of basic trigonometric and hyperbolic functions at rational values

A few days ago, while rummaging through my shelves, I stumbled upon a book by the number theorist Ivan Niven entitled “Irrational Numbers”-a relatively unknown book in my opinion. Obscured by stationery and other paraphernalia, the edge of the book peeked out of my drawer-its cover shrouded in dust and its title almost indiscernible if not for the striking glow of morning sunlight from my large window revealing its dark red tint. I wiped off the dust and peeked into its contents.

This was one of the first ‘real’ math textbooks that I ever read in high school along with Apostol’s books on Calculus and Lang’s Linear Algebra. I distinctly remember reading it for the first time when I was 13. It was undoubtedly quite difficult at the time, considering that I had little exposure to proof-based mathematics. Though the book is rather drab in its exposition, it undoubtedly had some interesting and simple mathematical facts about irrationality which you wouldn’t naturally find anywhere else except for some old number theory papers.

One of those little mathematical gems is the irrationality of trigonometric and hyperbolic functions at rational arguments. I shall soon prove that cos(x) is irrational if x \in \mathbb{Q} \backslash \{ 0 \} .


 

Lemma 1

If h \in \mathbb{Z}[x] and f(x)=\frac{x^{n}h(x)}{n!} is a polynomial in \mathbb{Q}[x], then f^{(k)}(0) \in \mathbb{Z} fork \in \mathbb{Z^{+}} and f^{(k)}(0) is divisible by n+1 if k \neq n. However, it is divisible by n+1 at k=n if h(0)=0.

 

I’ll leave the proof to the reader since its quite simple and involves nothing more than playing around with the polynomials.


 

Lemma 2

Let f \in \mathbb{R}[x] and define F(x) as such:
F(x)=f((r-x)^{2}) where r \in \mathbb{R}
Then, f^{(k)}(r)=0 if k is an odd integer.

Again, I’ll leave the trivialities of calculations and polynomial manipulations to the reader.


 

Now, to the main theorem.


 

Theorem

sin(x),cos(x),tan(x) are all irrational for non-trivial rational values of x.

Proof:

Let q=\frac{a}{b} where a,b \in \mathbb{Q}. Define a function f(x) as follows:

f(x)=\frac{x^{p-1}(a-bx)^{2p}(2a-bx)^{p-1}}{(p-1)!}

where p is an odd prime.
Substituting the expression for q,

f(x)=\frac{(r-x)^{2p}(r^{2}-(r-x)^{2})^{p-1}b^{3p-1}}{(p-1)!}                              (1)

Notice how we obtained a polynomial in (r-x)^{2} and of the form presented in Lemma 1. Also, observe that f(x)>0 for all values of x \in \mathbb{R}(p-1 is even) .

The next is to obtain an upper bound for f(x) when x \in (0,r).

Continue reading “Irrationality of basic trigonometric and hyperbolic functions at rational values”

Slick proof of Hilbert’s Nullstellensatz

It is a well known fact that there are multiple proofs for the Nullstellensatz which do not use Noether’s normalization lemma. Serge Lang’s proof(in his book) and Zariski’s proof both fall under this category. In fact, Daniel Allcock of UT Austin posted a proof which essentially builds from the edifice of Zariski’s proof(see that here). He claims that it is the simplest proof for the Nullstellensatz and frankly this is quite true considering the proof uses nothing more than basic field theory, is only one page long and just requires some familiarity with transcendence bases, and the transcendence degree. Yet in the true spirit of simplicity, T. Tao has presented a proof which uses nothing more than the extended Euclidean algorithm and “high school algebra” albeit with a lengthy case-by-case analysis(see the proof here).

Most of these proofs(except Tao’s) go about proving the ‘Weak’ Nullstellensatz and obtain the ‘Strong’ Nullstelensatz through the famous Rabinowitsch trick.

But a few days, I found something truly magnificent, a proof by Enrique Arrondo in the American Mathematical Monthly which proves the Nullstellensatz using a weaker version of Noether normalization and techniques similar to that of Tao, Ritrabata Munshi. The proof is essentially a simplification of a proof by R. Munshi.

Here, I present a special case of the normalization lemma.


 

Lemma

If F is an infinite field and f \in F[x_{1},x_{2},\cdots ,x_{n}] is a non-constant polynomial and n \geq 2 whose total degree is d, then there exists a_{1},a_{2},\cdots ,a_{n-1} \in F such that the coefficient of x_{n}^{d} in

f(x_{1}+a_{1}x_{n},x_{2}+a_{2}x_{n},\cdots ,x_{n-1}+a_{n-1}x_{n},x_{n})

is non-zero.

Proof:

Let f_{d} represent the homogenous component of f of degree d.So, the coefficient of x_{d}^{n} in f(x_{1}+a_{1}x_{n},x_{2}+a_{2}x_{n},\cdots,x_{n-1}+a_{n-1}x_{n},x_{n}) is f_{d}(a_{1},\cdots,a_{n-1},1). As a polynomial in F[x_{1},\cdots,x_{n-1}], there is some point (a_{1},\cdots,a_{n-1} \in F^{n-1} where f_{d}(a_{1},\cdots,a_{n-1},1) doesn’t vanish. Choose such a point to establish the a_{i} and this guarantees a non-zero coefficient of x_{n}^{d}.

Continue reading “Slick proof of Hilbert’s Nullstellensatz”