What is the kernel trick and how is it used in machine learning algorithms? How does the kernel trick help transform non-linear data into higher-dimensional spaces? What role do kernel functions play in Support Vector Machines (SVMs)? What are the common types of kernels used in machine learning? What are the advantages and limitations of using the kernel trick in real-world applications?