What is a Hidden Markov Model (HMM) and how is it used in natural language processing tasks? How do hidden states and observable outputs work together in an HMM? What are the key assumptions behind HMM models in sequence prediction? In which NLP applications are HMMs commonly used? What are the advantages and limitations of using HMMs in modern NLP systems?