我正在尝试计算文件中特定索引下单词的出现次数,并将其作为字典打印出来。
def count_by_fruit(file_name="file_with_fruit_data.txt"):
with open(file_name, "r") as file:
content_of_file = file.readlines()
dict_of_fruit_count = {}
for line in content_of_file:
line = line[0:-1]
line = line.split("\t")
for fruit in line:
fruit = line[1]
dict_of_fruit_count[fruit] = dict_of_fruit_count.get(fruit, 0) + 1
return dict_of_fruit_count
print(count_by_fruit())
输出:{'apple':6,'banana':6,'orange':3}
我得到此输出,但是,它没有正确计算单词的出现频率。在四处搜寻之后,我似乎找不到合适的解决方案。有人可以帮我找出我的错误吗?
我的文件包含以下内容:(用制表符分隔的数据,请在示例中加上“ \ t”,原因是stackoverflow更改了格式)
答案 0 :(得分:1)
您在同一行上循环了太多次。请注意,您得到的结果都是预期的三倍。
此外,在Python中,您也不需要读取整个文件。只需逐行遍历文件对象。
尝试:
def count_by_fruit(file_name="file_with_fruit_data.txt"):
with open(file_name, "r") as f_in:
dict_of_fruit_count = {}
for line in f_in:
fruit=line.split("\t")[1]
dict_of_fruit_count[fruit] = dict_of_fruit_count.get(fruit, 0) + 1
return dict_of_fruit_count
其中可以进一步简化为:
def count_by_fruit(file_name="file_with_fruit_data.txt"):
with open(file_name) as f_in:
dict_of_fruit_count = {}
for fruit in (line.split('\t')[1] for line in f_in):
dict_of_fruit_count[fruit] = dict_of_fruit_count.get(fruit, 0) + 1
return dict_of_fruit_count
或者,如果可以使用Counter:
from collections import Counter
def count_by_fruit(file_name="file_with_fruit_data.txt"):
with open(file_name) as f_in:
return dict(Counter(line.split('\t')[1] for line in f_in))
答案 1 :(得分:1)
问题是for fruit in line:
。拆分选项卡上的线会将其分为三部分。如果您每次循环遍历这三个部分,每个部分加一个,那么您的计数将是实际数据的3倍。
下面是我如何使用生成器表达式和Counter
编写此函数的方法。
from collections import Counter
def count_by_fruit(file_name="file_with_fruit_data.txt"):
with open(file_name, "r") as file:
lines = (line[:-1] for line in file)
fruit = (line.split('\t')[1] for line in lines)
return Counter(fruit)