我有以下示例数据
docs_word = ["this is a test", "this is another test"]
docs_txt = ["this is a great test", "this is another test"]
我现在要做的是在示例文件中创建单词的两个词典,比较它们并存储docs_txt文件中的单词,而不是单独的字典中的docs_word文件中的单词。因此我写了以下内容:
count_txtDoc = Counter()
for file in docs_word:
words = file.split(" ")
count_txtDoc.update(words)
count_wrdDoc = Counter()
for file in docs_txt:
words = file.split(" ")
count_wrdDoc.update(words)
#Create a list of the dictionary keys
words_worddoc = count_wrdDoc.keys()
words_txtdoc = count_txtDoc.keys()
#Look for values that are in word_doc but not in txt_doc
count_all = Counter()
for val in words_worddoc:
if val not in words_txtdoc:
count_all.update(val)
print(val)
现在的事情是打印正确的值。它显示:“很棒”。
但是,如果我打印:
print(count_all)
我得到以下输出:
Counter({'a': 1, 'r': 1, 'e': 1, 't': 1, 'g': 1})
虽然我期待
Counter({'great': 1})
有关如何实现这一目标的任何想法? # 打印(count_all)
答案 0 :(得分:1)
使用包含单词的迭代来更新计数器,而不是单词本身(因为单词也是可迭代的):
count_all.update([val])
# ^ ^
但是,如果您只是项目,则可能不需要创建新的计数器。您可以采用键的对称差异:
words_worddoc = count_wrdDoc.viewkeys() # use .keys() in Py3
words_txtdoc = count_txtDoc.viewkeys() # use .keys() in Py3
print(words_txtdoc ^ words_worddoc)
# set(['great'])
如果您还想要计数,您可以计算两个计数器之间的对称差异,如下所示:
count_all = (count_wrdDoc - count_txtDoc) | (count_txtDoc - count_wrdDoc)
print (count_all)
# Counter({'great': 1})