What is the vanishing gradient problem in deep learning and why does it occur during training? How does it affect the learning ability of deep neural networks? What are the common techniques used to prevent or reduce vanishing gradients? How do activation functions like ReLU help in solving this issue? What role do architectures like LSTM and ResNet play in addressing vanishing gradients?