Question

我有一个代码来计算文件夹中每个文件的字符串（每个文件是一年中的一个月，即2012年04月，2006年11等），并将它们相加：

mypath = "C:\Users\Desktop\FILE\\"
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath,f))]
result = {}
for f in onlyfiles: #imports all files in CSV folder as 'onlyfiles'
    with open(mypath + f, 'r') as content_file:
        content = content_file.read()
        a1 = content.count('Bacon')
        a2 = content.count('Eggs')
        total = a1 + a2
    result[f.strip(".csv")] = total

然后将值放在字典中：

new_dictionary = {}
count = 0
for m, n in result.items():
    print 'The number of bacon and eggs in', m, "was", n
    count += 1
    new_dictionary['month_{}'.format(count)] = result

最后将它们绘制在图表上：

plt.plot(result.values())
plt.ylabel('Bacon and eggs seen in this month')
plt.xlabel('Time')
plt.title('Amount of times bacon and eggs seen over time')
plt.xticks(range(len(result)), result.keys())
plt.show()

然而，当它打印图形时，时间（月等）是随机顺序而不是它们随时间的顺序，因为它们在文件夹中是这样的：

Graph

如何让图表按逻辑顺序绘制？

我尝试过使用list.sorted方法，但它最终会打印出奇怪的东西。

注意：数据是由于真实数据是敏感的，但原理相同。

Answer 1

在填充new_dictionary时，您应按顺序提供值：

for m, n in sorted(result.items()):

Answer 2

您可能需要查看https://docs.python.org/2/library/os.path.html，因为这可能对您有益。

你可以利用＆＃34; os.path.split（）＆＃34;拆分文件路径，以便列出：

['root path','file.csv']

然后你可以使用os.path.splitext（）返回另一个列表：

['file','csv']

如果你有：2015-03.csv，你可以这样做：

filename = os.path.splitext(os.path.split(f)[1])[0] 
# get list item 1 from os.path.split() and use that 
# in os.path.splitext() and grab the first list item

然后，您可以将其添加到字典中或使用嵌套字典，如：

mypath = "C:\Users\Desktop\FILE\\"
result = {}
for f in [f for f in os.path.listdir(mypath) if os.path.isfile(f)]:
    with open(os.path.abspath(f), "r") as content_file:
        content = content_file.read()
        a1 = content.count('Bacon')
        a2 = content.count('Eggs')
        total = a1 + a2
    result[os.path.splitext(os.path.split(f)[1])[0]] = {"Bacon":a1,"Eggs":a2,"Total":total}

for filename in sorted(result.iterkeys()):
    print("File: {0}; Bacon: {1}; Eggs: {2}; Total: {3}").format(filename,result[filename]["Bacon"],result[filename]["Eggs"],result[filename]["Total"])

您是否考虑过正则表达式？ re.findall（）返回结果列表：

bacon = re.findall(re.compile(r"bacon",re.MULTILINE),content)
eggs = re.findall(re.compile(r"eggs",re.MULTILINE),content)

print(str("Total bacon: {0}").format(len(bacon)))
print(str("Total eggs: {0}").format(len(eggs)))

如果您正在使用大文件，那么您可能需要考虑使用mmap将整个内容读入内存。有关详细信息，请查看https://docs.python.org/2/library/re.html。

按名称为图表排序从文件夹中提取的文件

2 个答案: