What is tokenization and why is it a crucial step in working with large language models (LLMs)? How are text inputs converted into tokens that LLMs can process? What are the different types of tokenization methods, such as word-level, subword-level, and byte-pair encoding? How does tokenization impact the efficiency and performance of LLMs? What challenges and limitations arise from tokenization in natural language processing?