Lily I want to understand what layer normalization is in deep learning. How does it improve training stability and performance in neural networks? Can someone also explain how it differs from batch normalization?