TF-IDF
TF-IDF, or Term Frequency-Inverse Document Frequency, is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents, known as a corpus. It combines two components: Term Frequency, which counts how often a word appears in a document, and Inverse Document Frequency, which assesses how common or rare a word is across all documents.
The main idea behind TF-IDF is that words that appear frequently in a specific document but are rare in the overall corpus are more significant. This helps in tasks like information retrieval and text mining, allowing systems to identify relevant documents based on user queries.