Text mining (also written as text-mining) is
|“||a process of discovering and extracting text-related content from unstructured, miscellaneous data. Text-mining is often mentioned in the context of several information-age trends such as big data, bioinformatics, data curation, e-Science and the semantic web. Currently, there are a number of social media monitoring tools that perform various types of text-mining activities.||”|
Typically, text-mining comprises three major activities: 1) information retrieval (IR) to gather relevant unstructured text among heterogeneous databases, documents and websites, 2) information extraction (IE) to identify and extract entities, facts and relationships among those entities, and 3) data-mining to find associations among the information extracted in the various texts located.
The goal of text-mining is to extract and discover knowledge hidden in text by identifying concepts, extracting facts/relationships in texts, discovering implicit links and generating hypotheses. One of the main reasons text-mining may be important is to deal with information overload created by blogs, wikis, clinical data, surveys, heterogeneous databases and the web. Text-mining is especially useful in areas where large collections of data and information in documents are located.