What is hinge loss and how is it used in deep learning models? How does hinge loss function help in training classifiers like Support Vector Machines (SVMs)? What is the mathematical formulation of hinge loss and what does it measure? How does hinge loss differ from other loss functions such as cross-entropy or mean squared error? In what scenarios is hinge loss particularly effective for deep learning applications?