The title may seem like a contradiction given that there is such a thing as the **Universal Approximation Theorem **which simply states that a neural network with a single hidden layer of finite width(i.e finite number of neurons) can approximate any function on a compact set of given that the activation function is non-constant,bounded and continuous.

Needless to say, I haven’t found any kind of flaws in the existing proofs(see Kolmogrov or Cybenko). However, I thought of something a little simple and scoured the internet for an answer.

What if we allow an **arbitrary number** of hidden layers and **bound the dimension** of the hidden layers making them ‘Deep, Skinny Neural Networks’? Would that be a universal approximator?

Continue reading “Topological Constraints on the Universal Approximation of Neural Networks”