What are BERT and GPT models and how are they used in natural language processing? How does the architecture of BERT differ from GPT in terms of bidirectional and unidirectional processing? What are the key use cases where BERT performs better than GPT and vice versa? How do their training approaches and objectives differ? What are the advantages and limitations of each model in real-world applications?