newBag = removeNgrams(bag,idx)
specifies n-grams by numeric or logical indices in bag.Ngrams.
This syntax is the same as newBag =
removeNgrams(bag,bag.Ngrams(idx,:)).
Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.
Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.
bag — Input bag-of-n-grams model bagOfNgrams object
Input bag-of-n-grams model, specified as a bagOfNgrams object.
ngrams — N-grams to remove string array | character vector | cell array of character vectors
N-grams to remove, specified as a string array, character vector, or a
cell array of character vectors.
If ngrams is a string array or cell array, then it
has size NumNgrams-by-maxN , where
NumNgrams is the number of n-grams, and
maxN is the length of the largest n-gram. If
ngrams is a character vector, then it represents a
single word (unigram).
The value of ngrams(i,j) is the jth
word of the ith n-gram. If the number of words in the
ith n-gram is less than maxN, then
the remaining entries of the ith row of
ngrams are empty.
Example: ["An" ""; "An example"; "example"
""]
Data Types: string | char | cell
idx — Indices of n-grams to remove vector of numeric indices | vector of logical indices
Indices of n-grams to remove, specified as a vector of numeric indices or
a vector of logical indices. The indices in idx
correspond to the rows of the bag.Ngrams.