What is data preprocessing and why is it important in machine learning? What are the common steps involved in preparing raw data for machine learning models? How do techniques like normalization, encoding, and handling missing values improve model performance? What challenges are faced during data preprocessing in large datasets? How does proper preprocessing affect the accuracy and reliability of machine learning models?