Modeling and Prediction

Develop predictive models using topic models and word embeddings

To find clusters and extract features from high-dimensional text datasets, you can use machine learning techniques and models such as LSA, LDA, and word embeddings. You can combine features created with Text Analytics Toolbox™ with features from other data sources. With these features, you can build machine learning models that take advantage of textual, numeric, and other types of data.

Functions

expand all

bagOfWordsBag-of-words model
bagOfNgramsBag-of-n-grams model
addDocumentAdd documents to bag-of-words or bag-of-n-grams model
removeDocumentRemove documents from bag-of-words or bag-of-n-grams model
removeInfrequentWordsRemove words with low counts from bag-of-words model
removeInfrequentNgramsRemove infrequently seen n-grams from bag-of-n-grams model
removeWordsRemove selected words from documents or bag-of-words model
removeNgramsRemove n-grams from bag-of-n-grams model
removeEmptyDocumentsRemove empty documents from tokenized document array, bag-of-words model, or bag-of-n-grams model
topkwordsMost important words in bag-of-words model or LDA topic
topkngramsMost frequent n-grams
encodeEncode documents as matrix of word or n-gram counts
tfidfTerm Frequency–Inverse Document Frequency (tf-idf) matrix
joinCombine multiple bag-of-words or bag-of-n-grams models
vaderSentimentScoresSentiment scores with VADER algorithm
ratioSentimentScoresSentiment scores with ratio rule
fastTextWordEmbeddingPretrained fastText word embedding
wordEncodingWord encoding model to map words to indices and back
doc2sequenceConvert documents to sequences for deep learning
wordEmbeddingLayerWord embedding layer for deep learning networks
word2vecMap word to embedding vector
word2indMap word to encoding index
vec2wordMap embedding vector to word
ind2wordMap encoding index to word
isVocabularyWordTest if word is member of word embedding or encoding
readWordEmbeddingRead word embedding from file
trainWordEmbeddingTrain word embedding
writeWordEmbeddingWrite word embedding file
wordEmbeddingWord embedding model to map words to vectors and back
extractSummaryExtract summary from documents
rakeKeywordsExtract keywords using RAKE
textrankKeywordsExtract keywords using TextRank
bleuEvaluationScoreEvaluate translation or summarization with BLEU similarity score
rougeEvaluationScoreEvaluate translation or summarization with ROUGE similarity score
bm25SimilarityDocument similarities with BM25 algorithm
cosineSimilarityDocument similarities with cosine similarity
textrankScoresDocument scoring with TextRank algorithm
lexrankScoresDocument scoring with LexRank algorithm
mmrScoresDocument scoring with Maximal Marginal Relevance (MMR) algorithm
fitldaFit latent Dirichlet allocation (LDA) model
fitlsaFit LSA model
resumeResume fitting LDA model
logpDocument log-probabilities and goodness of fit of LDA model
predictPredict top LDA topics of documents
transformTransform documents into lower-dimensional space
ldaModelLatent Dirichlet allocation (LDA) model
lsaModelLatent semantic analysis (LSA) model
wordcloudCreate word cloud chart from text, bag-of-words model, bag-of-n-grams model, or LDA model
textscatter2-D scatter plot of text
textscatter33-D scatter plot of text

Topics

Classification and Modeling

Create Simple Preprocessing Function

This example shows how to create a function which cleans and preprocesses text data for analysis.

Create Simple Text Model for Classification

This example shows how to train a simple text classifier on word frequency counts using a bag-of-words model.

Analyze Text Data Using Multiword Phrases

This example shows how to analyze text using n-gram frequency counts.

Analyze Text Data Using Topic Models

This example shows how to use the Latent Dirichlet Allocation (LDA) topic model to analyze text data.

Choose Number of Topics for LDA Model

This example shows how to decide on a suitable number of topics for a latent Dirichlet allocation (LDA) model.

Compare LDA Solvers

This example shows how to compare latent Dirichlet allocation (LDA) solvers by comparing the goodness of fit and the time taken to fit the model.

Sentiment Analysis and Keyword Extraction

Analyze Sentiment in Text

This example shows how to use the Valence Aware Dictionary and sEntiment Reasoner (VADER) algorithm for sentiment analysis.

Generate Domain Specific Sentiment Lexicon

This example shows how to generate a lexicon for sentiment analysis using 10-K and 10-Q financial reports.

Train a Sentiment Classifier

This example shows how to train a classifier for sentiment analysis using an annotated list of positive and negative sentiment words and a pretrained word embedding.

Extract Keywords from Text Data Using RAKE

This example shows how to extract keywords from text data using Rapid Automatic Keyword Extraction (RAKE).

Extract Keywords from Text Data Using TextRank

This example shows to extract keywords from text data using TextRank.

Deep Learning

Classify Text Data Using Deep Learning

This example shows how to classify text data using a deep learning long short-term memory (LSTM) network.

Classify Text Data Using Convolutional Neural Network

This example shows how to classify text data using a convolutional neural network.

Classify Out-of-Memory Text Data Using Deep Learning

This example shows how to classify out-of-memory text data with a deep learning network using a transformed datastore.

Sequence-to-Sequence Translation Using Attention

This example shows how to convert decimal strings to Roman numerals using a recurrent sequence-to-sequence encoder-decoder model with attention.

Generate Text Using Deep Learning (Deep Learning Toolbox)

This example shows how to train a deep learning long short-term memory (LSTM) network to generate text.

Pride and Prejudice and MATLAB

This example shows how to train a deep learning LSTM network to generate text using character embeddings.

Word-By-Word Text Generation Using Deep Learning

This example shows how to train a deep learning LSTM network to generate text word-by-word.

Classify Text Data Using Custom Training Loop

This example shows how to classify text data using a deep learning bidirectional long short-term memory (BiLSTM) network with a custom training loop.

Generate Text Using Autoencoders

This example shows how to generate text data using autoencoders.

Define Text Encoder Model Function

This example shows how to define a text encoder model function.

Define Text Decoder Model Function

This example shows how to define a text decoder model function.

Language Support

Language Considerations

Information on using Text Analytics Toolbox features for other languages.

Japanese Language Support

Information on Japanese support in Text Analytics Toolbox.

Analyze Japanese Text Data

This example shows how to import, prepare, and analyze Japanese text data using a topic model.

German Language Support

Information on German support in Text Analytics Toolbox.

Analyze German Text Data

This example shows how to import, prepare, and analyze German text data using a topic model.

Featured Examples