What is feature scaling and why is it important in machine learning? How does feature scaling improve the performance of algorithms? What are the common techniques used for feature scaling? How does feature scaling affect distance-based algorithms? When should feature scaling be applied during the data preprocessing stage?