What is categorical cross-entropy and how is it used as a loss function in machine learning? How does categorical cross-entropy measure the difference between predicted probabilities and actual class labels? Why is it commonly used in multi-class classification problems? How is categorical cross-entropy calculated and interpreted during model training? What are the advantages and limitations of using categorical cross-entropy in real-world applications?