What is the learning rate in deep learning and why is it important for training models? How does the learning rate influence the speed and stability of model convergence? What happens when the learning rate is set too high or too low? What techniques, such as learning rate scheduling, are used to optimize it during training? What are the challenges and best practices for selecting an appropriate learning rate?