newBag = removeInfrequentNgrams(bag,count)
removes the n-grams that appear at most count times in total
from the bag-of-n-grams model bag. The function, by default, is
case sensitive.
newBag = removeInfrequentNgrams(bag,count,'NgramLengths',lengths)
only removes n-grams with lengths specified by lengths. The
function, by default, is case sensitive.
newBag = removeInfrequentNgrams(___,'IgnoreCase',true)
removes the n-grams that appear at most count times ignoring
case. If n-grams differ only by case, then the corresponding counts are
merged.
Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.
N-gram lengths, specified as a positive integer or a vector of positive
integers.
If you specify lengths, the function removes
infrequent n-grams of the specified lengths only. If you do not specify
lengths, then the function removes infrequent
n-grams regardless of length.