Question

我创建了一个matlab程序，用于在文本文件中查找单词bigrams及其频率。为此，我使用textread函数创建了一个字符串单元格数组：

unigrams = textread（'file.txt'，'％s'）;

但我也希望省略一些词语，如'to'，'the'，'is'，'或'等特殊字符'＃'，'$'，'＆amp;'和我的单元格数组中的'％'。有没有办法在从原始文件中读取单词时排除这些单词。

感谢。

Answer 1

您可以在阅读文字后使用setdiff 删除不需要的字词：

unigrams = {'I' 'like' 'this' 'or' 'that' 'Here' 'are' 'some' 'symbols' '#' '$' '&'} setdiff(unigrams, {'the', 'is' 'or' '#' '$' '&'}, 'stable') unigrams = Columns 1 through 8 'I' 'like' 'this' 'or' 'that' 'Here' 'are' 'some' Columns 9 through 12 'symbols' '#' '$' '&' ans = 'I' 'like' 'this' 'that' 'Here' 'are' 'some' 'symbols'

使用matlab从文件中读取文本时跳过某些单词

1 个答案: