The Vanishing Gradient Problem

by Avi Kedare Introduction When I first started building deep neural networks, I had this assumption more layers means better performance. More depth, more abstraction, better features, right? Seemed logical. So I stacked 10, 12, 15 layers and hit train. The loss just… sat there. Barely moved. Like the network had completely given up learning. I checked my code a dozen times. Loss function, correct. Data pipeline, fine. Learning rate, reasonable. But something was broken in a way that wasn't obvious from the outside. It took me a while to figure out what was actually happening inside those early layers. The gradients those little correction signals that tell each layer how to update its weights were becoming so small by the time they reached the front of the network that those layers were essentially frozen. Not frozen because I told them to be. Frozen because the math was killing the signal before it could get there. That's the vanishing gradient problem. And once you ...

Search This Blog

The Vanishing Gradient Problem

Posts

Vanishing Gradient Problem in Deep Neural Networks