Proof of the Sensitivity Conjecture without Cauchy’s Interlacing Theorem

If you aren’t aware, one month ago, the Sensitivity Conjecture, a 30-year old problem, was proven by Hao Hung, an assistant professor at Emory University in just a little more than 2 pages using particularly simple methods albeit cleverly implemented. The proof is very easy to understand and requires nothing more than basic linear algebra. You can find it here.

A few days ago, Donald Knuth simplified the already simple proof of H. Hung and could fit all his arguments in half a page. What really shocked me when I heard the news is that Knuth is still alive. I could have sworn that it was only a few days ago when I was skimming through a textbook on Discrete Mathematics and saw a bio with a black-and-white pic of him in one of the chapters related to computational complexity in the same manner that I’d see pictures and mini-biographies of other 20th century giants such as Grothendieck and Nash relegated to the margins of math textbooks and online articles romantically detailing the course of their fabled lives.

Now that I’ve realized that he’s well and alive, it shocks me equally to learn that at the age of 81, he is still able to contribute to research.

I’m not exactly going to regurgitate Knuth’s proof but what I’ll present certainly uses the same ideas. I must note here than Knuth’s proof itself is inspired by a comment on Scott Aronson’s blog which provided a method to prove the conjecture without using Cauchy’s Interlacing Theorem.

If you don’t know what the Sensitivity Conjecture, I’ll state it below.

If f:\{0,1\}^{n} \mapsto \{0,1 \} is a boolean function where the domain is the graph of the n-dimensional cube denoted by Q^{n}=\{0,1 \}^{n}. So, a particular input x is just a string of 0s and 1s. The local sensitivity of f at an input x is the number of indices in the ‘string’ of x that you can flip and not change the output. The local block sensitivity of f at input x is the number of disjoint subsets of the set of indices \{0,1,2 \cdots, n \} such that the output doesn’t change when every index corresponding to a block flips in the input string of x. The sensitivity and block sensitivity are defined to be the maximum of these corresponding measures over all the inputs.

Continue reading “Proof of the Sensitivity Conjecture without Cauchy’s Interlacing Theorem”

Topological Constraints on the Universal Approximation of Neural Networks

The title may seem like a contradiction given that there is such a thing as the Universal Approximation Theorem which simply states that a neural network with a single hidden layer of finite width(i.e finite number of neurons) can approximate any function on a compact set of \mathbb{R}^{n} given that the activation function is non-constant,bounded and continuous.

Needless to say, I haven’t found any kind of flaws in the existing proofs(see Kolmogrov or Cybenko). However, I thought of something a little simple and scoured the internet for an answer.

What if we allow an arbitrary number of hidden layers and bound the dimension of the hidden layers making them ‘Deep, Skinny Neural Networks’? Would that be a universal approximator?

Continue reading “Topological Constraints on the Universal Approximation of Neural Networks”

Schur’s Lemma

This is a standard result that any beginner in the study of the representation theory would be aware of. I merely present restate the theorem a little(essentially I don’t) and prove it.

Theorem:

Let V,W be representations of a group G. If f:V \mapsto W is a G-linear map, then the following are true assuming that V and W are not isomorphic:

1)If both V,W are irreducible, then f is either an isomorphism or the zero map.

2)If only V is irreducible, then f is either injective or the zero map.

3)If only W is irreducible, then f is either surjective or the zero map


If V,W are isomorphic as representations, then f is a scalar multiplication map.

Proof:

Let \rho_{V},\rho_{W} be the respective representations. Here, V,W are vector spaces over an algebraically closed field.

We tackle the first part of the theorem.

Consider the kernel of f, say Ker(f), which is set of all v \in V such that f(v)=0. Since f is a G-linear map, f(g.v)=g.f(v)=0 implies that if v \in Ker(f), then g.v \in Ker(f). Hence, Ker(f) is stable under the action of G which makes it a sub-representation. Since V is irreducible, this means that either Ker(f) must be zero or f must be the zero map. This proves that f is injective.

Continue reading “Schur’s Lemma”

Irrationality of basic trigonometric and hyperbolic functions at rational values

A few days ago, while rummaging through my shelves, I stumbled upon a book by the number theorist Ivan Niven entitled “Irrational Numbers”-a relatively unknown book in my opinion. Obscured by stationery and other paraphernalia, the edge of the book peeked out of my drawer-its cover shrouded in dust and its title almost indiscernible if not for the striking glow of morning sunlight from my large window revealing its dark red tint. I wiped off the dust and peeked into its contents.

This was one of the first ‘real’ math textbooks that I ever read in high school along with Apostol’s books on Calculus and Lang’s Linear Algebra. I distinctly remember reading it for the first time when I was 13. It was undoubtedly quite difficult at the time, considering that I had little exposure to proof-based mathematics. Though the book is rather drab in its exposition, it undoubtedly had some interesting and simple mathematical facts about irrationality which you wouldn’t naturally find anywhere else except for some old number theory papers.

One of those little mathematical gems is the irrationality of trigonometric and hyperbolic functions at rational arguments. I shall soon prove that cos(x) is irrational if x \in \mathbb{Q} \backslash \{ 0 \} .


 

Lemma 1

If h \in \mathbb{Z}[x] and f(x)=\frac{x^{n}h(x)}{n!} is a polynomial in \mathbb{Q}[x], then f^{(k)}(0) \in \mathbb{Z} fork \in \mathbb{Z^{+}} and f^{(k)}(0) is divisible by n+1 if k \neq n. However, it is divisible by n+1 at k=n if h(0)=0.

 

I’ll leave the proof to the reader since its quite simple and involves nothing more than playing around with the polynomials.


 

Lemma 2

Let f \in \mathbb{R}[x] and define F(x) as such:
F(x)=f((r-x)^{2}) where r \in \mathbb{R}
Then, f^{(k)}(r)=0 if k is an odd integer.

Again, I’ll leave the trivialities of calculations and polynomial manipulations to the reader.


 

Now, to the main theorem.


 

Theorem

sin(x),cos(x),tan(x) are all irrational for non-trivial rational values of x.

Proof:

Let q=\frac{a}{b} where a,b \in \mathbb{Q}. Define a function f(x) as follows:

f(x)=\frac{x^{p-1}(a-bx)^{2p}(2a-bx)^{p-1}}{(p-1)!}

where p is an odd prime.
Substituting the expression for q,

f(x)=\frac{(r-x)^{2p}(r^{2}-(r-x)^{2})^{p-1}b^{3p-1}}{(p-1)!}                              (1)

Notice how we obtained a polynomial in (r-x)^{2} and of the form presented in Lemma 1. Also, observe that f(x)>0 for all values of x \in \mathbb{R}(p-1 is even) .

The next is to obtain an upper bound for f(x) when x \in (0,r).

Continue reading “Irrationality of basic trigonometric and hyperbolic functions at rational values”

Slick proof of Hilbert’s Nullstellensatz

It is a well known fact that there are multiple proofs for the Nullstellensatz which do not use Noether’s normalization lemma. Serge Lang’s proof(in his book) and Zariski’s proof both fall under this category. In fact, Daniel Allcock of UT Austin posted a proof which essentially builds from the edifice of Zariski’s proof(see that here). He claims that it is the simplest proof for the Nullstellensatz and frankly this is quite true considering the proof uses nothing more than basic field theory, is only one page long and just requires some familiarity with transcendence bases, and the transcendence degree. Yet in the true spirit of simplicity, T. Tao has presented a proof which uses nothing more than the extended Euclidean algorithm and “high school algebra” albeit with a lengthy case-by-case analysis(see the proof here).

Most of these proofs(except Tao’s) go about proving the ‘Weak’ Nullstellensatz and obtain the ‘Strong’ Nullstelensatz through the famous Rabinowitsch trick.

But a few days, I found something truly magnificent, a proof by Enrique Arrondo in the American Mathematical Monthly which proves the Nullstellensatz using a weaker version of Noether normalization and techniques similar to that of Tao, Ritrabata Munshi. The proof is essentially a simplification of a proof by R. Munshi.

Here, I present a special case of the normalization lemma.


 

Lemma

If F is an infinite field and f \in F[x_{1},x_{2},\cdots ,x_{n}] is a non-constant polynomial and n \geq 2 whose total degree is d, then there exists a_{1},a_{2},\cdots ,a_{n-1} \in F such that the coefficient of x_{n}^{d} in

f(x_{1}+a_{1}x_{n},x_{2}+a_{2}x_{n},\cdots ,x_{n-1}+a_{n-1}x_{n},x_{n})

is non-zero.

Proof:

Let f_{d} represent the homogenous component of f of degree d.So, the coefficient of x_{d}^{n} in f(x_{1}+a_{1}x_{n},x_{2}+a_{2}x_{n},\cdots,x_{n-1}+a_{n-1}x_{n},x_{n}) is f_{d}(a_{1},\cdots,a_{n-1},1). As a polynomial in F[x_{1},\cdots,x_{n-1}], there is some point (a_{1},\cdots,a_{n-1} \in F^{n-1} where f_{d}(a_{1},\cdots,a_{n-1},1) doesn’t vanish. Choose such a point to establish the a_{i} and this guarantees a non-zero coefficient of x_{n}^{d}.

Continue reading “Slick proof of Hilbert’s Nullstellensatz”

Coxeter Groups(intro)

Reflection groups

In order to understand the intuition underlying the theory of Coxeter groups(and Weyl groups in particular groups in particular), you can go through the theory of reflection groups which I’ll only superficially treat preceding my exposition of Coxetr systems and the associated representation theory.

Consider some Euclidean space V and a vector \alpha \in V. A reflection associated with \alpha is a linear operator which sends \alpha to -\alpha and point-wise fixes the orthogonal hyperplane/subspace H_{\alpha}.

If the reflection is s_{\alpha}, then it can represented as:

s_{\alpha}(x)=x-\frac{2(x,\alpha)}{(\alpha,\alpha)}\alpha=x-2 proj_{\alpha}(x)

Clearly s_{\alpha} is an orthogonal transformation and the set of all reflections in V can be recognized as the subgroup of O(V)(orthogonal group of V) consisting of elements of order 2.

Continue reading “Coxeter Groups(intro)”

Hopf Fibrations-Construction and Quaternion Interpretations (I)

Before I begin discussing the Hopf fibration of the 3-spbhere, one of the simplest yet deeply profound example of a non-trivial fiber bundle, I’d like to recall the definition of a fiber bundle.

Let E,B,F represent the entire space, base space and the fiber respectively where E,B are connected. If f:E \mapsto B is a continuous surjection onto the base space, then the structure (E,B,F,f) is said to be a fiber bundle if for every x\in E, there exists a neighborhood U \subset B of f(x) such that there exists a homeomorphism \psi:f^{-1}(U) \mapsto U \times F such that \psi \circ \pi_{U}=f.

What this basically means is that locally, the fiber bundle looks like the product B \times F but globally, it may have different topological properties.

A trivial fiber bundle is a fiber bundle which in which the total space is E=B \times F. In fact, any fiber bundle over a contractible CW Complex is trivial.

Continue reading “Hopf Fibrations-Construction and Quaternion Interpretations (I)”