Load the example data. The file sonnetsPreprocessed.txt contains preprocessed versions of Shakespeare's sonnets. The file contains one sonnet per line, with words separated by a space. Extract the text from sonnetsPreprocessed.txt, split the text into documents at newline characters, and then tokenize the documents.
N-gram lengths, specified as a positive integer or a vector of positive
integers.
If you specify lengths, the function removes
infrequent n-grams of the specified lengths only. If you do not specify
lengths, then the function removes infrequent
n-grams regardless of length.