This series(probably three posts) will be on the monad. The first part will deal with only an introduction to monads, algebras over moands and the adjunction problem . The second part will deal with coequalizers and Beck’s theorem in connection with Kleisli categories. In keeping with this blog’s slight CS tilt, the third part will deal with examples of monads from functional programming which I think are crucial to wholly understand the monad. I’ve noticed that I am running too many ‘series posts’ simultaneously. I am still working on the third part of the Grothendieck Spectral Sequence series and will continue the Hopf fibration post which I haven’t updated for a year!

I think that before I continue to introduce the main players of today’s post, I should review the definitions of an adjoint functor as it’ll be quite useful as a tool to see the power of monads.

Consider two categories $C,D$ and functors $F:D \to C$ and $G:C \to D$. $F,G$ are said to be adjoint i.e $F \dashv G$ if there exists a natural isomorphism of Hom-sets $Hom-{C}(Fd,c) \simeq Hom_{D}(d,Gc)$ for all $d \in D,c \in C$. Equivalently, by the unit=counit definition, if there exists the pair of natural transformations $\eta:1_{D} \Rightarrow GF$(unit) and $\epsilon:FG \Rightarrow 1_{C}$(counit) which satisfy the following commutative diagrams:

Throughout the post, I’ll represent natural transformations by the symbol $\Rightarrow$. I recommend the confused reader to refer to Aluffi’s Algebra Chapter 0 book in the section on category theory and also my post on the importance of adjoint functors.

If $\eta:F \Rightarrow G$ is a natural transformation of functors $F,G: A \to B$ whose components are given by $\eta_{X}:F(X) \to G(X)$ for an object $X$ in $A$. If $T:B \to C$ is another functor, then I’ll represent the components of the induced natural transformation $HF \Rightarrow HG$ by $(H\eta)_{X}=H(\eta_{X})$.

If instead, there is a functor $T:Z \to A$, then I’ll represent the components of the natural transformation $\eta Z:ZF \to ZG$ by $(\eta Z)_{X}=\eta_{Z(X)}$ where $X$ is an object in $Z$.

Let’s say that $T:C \to C$ is an endofunctor equppied with two natural transformations $\eta:1_{C} \Rightarrow T$(unit) and $\mu:TT \Rightarrow T$ such that the following diagrams commute:

## Proof of the Sensitivity Conjecture without Cauchy’s Interlacing Theorem

If you aren’t aware, one month ago, the Sensitivity Conjecture, a 30-year old problem, was proven by Hao Hung, an assistant professor at Emory University in just a little more than 2 pages using particularly simple methods albeit cleverly implemented. The proof is very easy to understand and requires nothing more than basic linear algebra. You can find it here.

A few days ago, Donald Knuth simplified the already simple proof of H. Hung and could fit all his arguments in half a page. What really shocked me when I heard the news is that Knuth is still alive. I could have sworn that it was only a few days ago when I was skimming through a textbook on Discrete Mathematics and saw a bio with a black-and-white pic of him in one of the chapters related to computational complexity in the same manner that I’d see pictures and mini-biographies of other 20th century giants such as Grothendieck and Nash relegated to the margins of math textbooks and online articles romantically detailing the course of their fabled lives.

Now that I’ve realized that he’s well and alive, it shocks me equally to learn that at the age of 81, he is still able to contribute to research.

I’m not exactly going to regurgitate Knuth’s proof but what I’ll present certainly uses the same ideas. I must note here than Knuth’s proof itself is inspired by a comment on Scott Aronson’s blog which provided a method to prove the conjecture without using Cauchy’s Interlacing Theorem.

If you don’t know what the Sensitivity Conjecture, I’ll state it below.

If $f:\{0,1\}^{n} \mapsto \{0,1 \}$ is a boolean function where the domain is the graph of the $n-dimensional$ cube denoted by $Q^{n}=\{0,1 \}^{n}$. So, a particular input $x$ is just a string of 0s and 1s. The local sensitivity of $f$ at an input $x$ is the number of indices in the ‘string’ of $x$ that you can flip and not change the output. The local block sensitivity of $f$ at input $x$ is the number of disjoint subsets of the set of indices $\{0,1,2 \cdots, n \}$ such that the output doesn’t change when every index corresponding to a block flips in the input string of $x$. The sensitivity and block sensitivity are defined to be the maximum of these corresponding measures over all the inputs.

## Topological Constraints on the Universal Approximation of Neural Networks

The title may seem like a contradiction given that there is such a thing as the Universal Approximation Theorem which simply states that a neural network with a single hidden layer of finite width(i.e finite number of neurons) can approximate any function on a compact set of $\mathbb{R}^{n}$ given that the activation function is non-constant,bounded and continuous.

Needless to say, I haven’t found any kind of flaws in the existing proofs(see Kolmogrov or Cybenko). However, I thought of something a little simple and scoured the internet for an answer.

What if we allow an arbitrary number of hidden layers and bound the dimension of the hidden layers making them ‘Deep, Skinny Neural Networks’? Would that be a universal approximator?

## The logarithmic spiral

The logarithmic spiral has some very interesting properties and Bernoulli was especially fascinated by it.I’ll prove it’s most important property(the angle between the curve and the radius at every angle is constant) and proceed with an example.

In polar co-ordinates,the equation of the spiral is given by:

$r(\theta)=ae^{k\theta}$ where $a,k$ are constants and $a>0$

Now,to prove that any line from the origin which intersects the curve does so by making a constant angle(say $\phi$) with the curve(direction of tangent line),we consider the derivatives of the parameter equations which correspond to $r(\theta)$