You are here

TextWise Technology

Semantic Gist®

TextWise has recently developed Semantic Gist® to provide intuitive semantic modeling on a large number of samples, particularly vertical text documents that often do not have classification schemes associated with them. These semantic models will automatically adapt to rapidly changing content, ensuring a high level of accuracy over time.

Semantic Gist® represents a significant advance in the use of machine learning, image and speech characterization, and neural networks to attack unsupervised semantic modeling. Our patent-pending approach generates a compact representation of any text by using advanced statistical language models to identify the significant features of a document.

An auto-encoder neural network encodes the features into a low-dimensionality semantic representation, and then reconstructs an approximation of the original feature vector from the semantic representation. The software highlights keywords that may be underrepresented by the semantic representation and encodes these separately as a complementary feature vector.

Finally, the complementary feature vector is combined with the semantic representation to produce a Semantic Gist® that can be easily used for document indexing, matching and other applications.

Trainable Semantic Vectors

In the late 1990's TextWise developed a unique semantic technology called Trainable Semantic Vectors (TSV), which is based on supervised statistical learning. Deployed as the basis of the TextWise API in 2008, TSV technology has undergone continuous improvement. TSV uses a set of predefined categories to describe a target content domain. Then, the essential meanings are assigned to the text by highlighting the categories that are most strongly associated with it. This scalable technology allows any document to be mapped into a semantic space, typically by assigning a few thousand dimensions, or categories, to the text.  TSV works especially well for applications that handle wide classes of content. Because automatic learning systems require training documents to develop proper statistical models, these systems are dependent on the availability of a classification schemes and associated training documents.  The complementary Semantic Gist®technology was developed for document collections that do not have classification schemes and associated training documents.