我有两个文本文件。从textfile1,我选择了50个最常用的单词。现在我想搜索这50个最常用的单词。
readFile = open('textfile1.text', 'r')
sepFile = readFile.read()
words = re.findall('\w+', sepFile)
for word in [words]:
word_long = [w for w in word if len(w) > 3]
word_count = Counter(word_long).most_common(50)
count = word_count
list1=count
readFile1 = open('textfile2.txt', 'r')
sepFile1 = readFile1.read()
word2 = re.findall('\w+', sepFile1)
for word in [word2]:
word_long1 = [w for w in word if len(w) > 3]
word_count1 = Counter(word_long1).most_common(50)
count2 = word_count1
list1=count2
a=words1
c=Counter(a)
for w in words:
print w, c.get(w,0)
答案 0 :(得分:1)
使用dictionaries可能会有所帮助。 Counter.most_common()
会返回一个元组列表,您可以将其转换为dict
:
file1_common_words = dict(Counter(all_words_in_file1).most_common(50))
file2_common_words = dict(Counter(all_words_in_file2).most_common(50))
然后,对于file1_common_words
中的每个字词,您可以在file2_common_words
中查找该字词,以便在文件2中计算:
for (word, count) in file1_common_words.items():
try:
count_in_file2 = file2_common_words[word]
except KeyError:
# if the word is not present file2_common_words,
# then its count is 0.
count_in_file2 = 0
print("{0}\t{1}\t{2}".format(word, count, count_in_file2))
这将输出以下格式的行:
<most_common_word_1> <count_in_file1> <count_in_file2>
<most_common_word_2> <count_in_file1> <count_in_file2>
...