The title may seem like a contradiction given that there is such a thing as the Universal Approximation Theorem which simply states that a neural network with a single hidden layer of finite width(i.e finite number of neurons) can approximate any function on a compact set of given that the activation function is non-constant,bounded and continuous.
Needless to say, I haven’t found any kind of flaws in the existing proofs(see Kolmogrov or Cybenko). However, I thought of something a little simple and scoured the internet for an answer.
What if we allow an arbitrary number of hidden layers and bound the dimension of the hidden layers making them ‘Deep, Skinny Neural Networks’? Would that be a universal approximator?