What is gradient clipping and why is it used in deep learning? How does gradient clipping help prevent the exploding gradient problem during neural network training? What are the common methods used for gradient clipping? How does gradient clipping improve training stability and model performance? In what types of deep learning models is gradient clipping most commonly applied?