我对Python完全陌生,但令我惊讶的是,我已经制作了这段代码:
if __name__ == "__main__":
with open("wordlist.txt") as infile:
for line in infile:
print(line)
with open ("cv000_29416.txt", "r") as myfile:
data=myfile.read().replace('\n', '')
print (data.count("bad"))
关键是,我想计算cv000_29416.txt中wordlist.txt中的单词。
(所以wordlist.txt包含例如二十个单词,如' bad',' good' etcetera,cv000_29416.txt是一个长文本,我想计算多少时间'坏','好'等等在cv000_29416.txt中出现
我可以在秒段代码中插入某个地方吗?
谢谢! 抱歉英语不好
答案 0 :(得分:3)
# create a collection of the words that want to count
with open('wordlist.txt') as infile:
counts = {}
for line in infile:
for word in line.split():
counts[word] = 0
# increment the count of the words that you really care about
with open("cv000_29416.txt") as infile:
for line in infile:
for word in line.split():
if word in counts:
counts[word] += 1
for word,count in counts.items():
print(word, "appeared", count, "times")
答案 1 :(得分:2)
使用collections.Counter dict计算所有单词:
from collections import Counter
with open ("cv000_29416.txt", "r") as myfile:
data = Counter(myfile.read().split())
print (data["bad"])
要把它放在一起,假设每个单词在wordlist.txt中单独一行:
from collections import Counter
with open ("cv000_29416.txt", "r") as myfile,open("wordlist.txt") as infile:
data = Counter(myfile.read().split())
for line in infile:
print(data.get(line.rstrip(),0))