Remove short words from documents or bag-of-words model
newDocuments = removeShortWords(documents,len)
newBag = removeShortWords(bag,len)
example
newDocuments = removeShortWords(documents,len) removes words of length len or less from documents.
newDocuments
documents
len
newBag = removeShortWords(bag,len) removes words of length len or less from the bagOfWords object bag.
newBag
bag
bagOfWords
collapse all
Remove the words with two or fewer characters from a document.
document = tokenizedDocument("An example of a short sentence"); newDocument = removeShortWords(document,2)
newDocument = tokenizedDocument: 3 tokens: example short sentence
Remove the words with two or fewer characters from a bag-of-words model.
documents = tokenizedDocument([ ... "an example of a short sentence" "a second short sentence"]); bag = bagOfWords(documents); newBag = removeShortWords(bag,2)
newBag = bagOfWords with properties: Counts: [2x4 double] Vocabulary: ["example" "short" "sentence" "second"] NumWords: 4 NumDocuments: 2
tokenizedDocument
Input documents, specified as a tokenizedDocument array.
Input bag-of-words model, specified as a bagOfWords object.
Maximum length of words to remove, specified as a positive integer. The function removes words with len or fewer characters.
Output documents, returned as a tokenizedDocument array.
Output bag-of-words model, returned as a bagOfWords object.
bagOfNgrams | bagOfWords | normalizeWords | removeLongWords | removeWords | stopWords | tokenizedDocument
bagOfNgrams
normalizeWords
removeLongWords
removeWords
stopWords
You have a modified version of this example. Do you want to open this example with your edits?