What is gradient boosting and how does it work in machine learning? How does the algorithm build models sequentially to correct previous errors? What role do weak learners play in gradient boosting? How does gradient descent help optimize the model during training? What are the advantages and limitations of using gradient boosting in real-world applications?