https://arxiv.org/abs/2206.10381
Summary
Problem: Traditional methods of working with tabular data often don't use all the available information (like column headers) and require extensive pre-processing.
Solution: The TabText framework addresses this by:
Converting tabular data into a language-like format: This lets it use powerful language models.
Using pre-trained LLMs (Large Language Models): This extracts deeper insights and context from the data.
Results:
TabText makes it easier to build effective machine learning models with less pre-processing.
TabText can improve the performance of machine learning models when it's used alongside traditional tabular data representations.
A survey around LLMs for tabular data: https://arxiv.org/pdf/2402.05121.pdf
Example of code to generate embedding using a BERT-like model: https://github.com/kimvc7/TabText/blob/main/src/utils/biobert_utils.py
The graph above in ppt format