What is an optimizer in deep learning and why is it important for training neural networks? How do optimizers help minimize the loss function during model training? What are the commonly used optimization algorithms in deep learning? How do optimizers like SGD, Adam, and RMSprop differ from each other? What factors should be considered when choosing an optimizer for a deep learning model?