What is sparse categorical cross-entropy and how is it used in classification problems? How does it differ from categorical cross-entropy in terms of label representation? In what scenarios is sparse categorical cross-entropy preferred over other loss functions? How does it help in optimizing multi-class classification models? What are the advantages and limitations of using sparse categorical cross-entropy in deep learning?