Inverse Document Frequency

Inverse Document Frequency (IDF) is a statistical measure used in information retrieval and text mining. It helps determine the importance of a word within a collection of documents, known as a corpus. The basic idea is that words that appear frequently across many documents are less informative than those that are rare. IDF is calculated by taking the total number of documents in the corpus and dividing it by the number of documents containing the specific word. The result is then transformed using a logarithm. This process helps highlight unique terms that can better represent the content of individual documents.