我试图提取最新的苹果'梨#39;和其他.csv存储在python的目录中。新文件以相同的前缀但以不同的频率存储(例如,apple_gets每5天更新一次)。查看latestfile = max(filenames, key=os.path.getctime)
但类别.startwith
之类的内容?具体 - 所以如果有一个melon_csv,即使它已经在几个月前被保存了,我也会把它拉出来。
"""
fileDir contains csv files such as:
pear_20171102_report2.csv
apple_20171027_report2.csv
orange_20171101_report2.csv
kiwi 20171102 report2.csv
pear_20171101_report2.csv
cherry 20171101 report2.csv
kiwi 20171101 report2.csv
cherry 20171031_report2.csv
mango 20171001 report2.csv
apple_20171101_report2.csv
apple_20171102_report2.csv
...
"""
import glob
import os
import re
fileDir = r'\\ac2knyc05\TestData/'
filenames = glob.glob(fileDir+'*')
regex = re.compile(r'\d{8}')
dates = []
prefix = []
for filename in filenames:
try:
date = regex.search(filename).group()
dates.append(date)
prefix.append(filename.split(date)[0])
except AttributeError:
print(filename)
latestfile = max(filenames, key=os.path.getctime)
print(set(prefix))
坚持到这里,不知道如何继续,也许熊猫?
答案 0 :(得分:2)
不需要大熊猫,您可以使用itertools groupby:
from itertools import groupby
def key(filename):
return filename.replace(" ", "_").split("_")[0]
{k: max(g, key=os.path.getctime)
for k, g in groupby(sorted(filenames, key=key), key)}
同时为您提供最新文件的类别字典。
注意:您可以使用for循环一次性获取此内容:
res = {}
for f in filenames:
k, t = key(f), os.path.getctime(f)
if k not in res:
res[k] = f, t
else:
_, t_ = res[k]
if t > t_:
res[k] = f, t
[f for f, _ in res.values()] # list of the latest file for each category
答案 1 :(得分:1)
不需要大熊猫。您可以简单地将这些文件名放在列表的字典中:
filenames = """pear_20171102_report2.csv
apple_20171027_report2.csv
orange_20171101_report2.csv
kiwi 20171102 report2.csv
pear_20171101_report2.csv
cherry 20171101 report2.csv
kiwi 20171101 report2.csv
cherry 20171031_report2.csv
mango 20171001 report2.csv
apple_20171101_report2.csv
apple_20171102_report2.csv"""
categories = {}
for filename in filenames.split("\n"):
start_with = filename.split(' ')[0].split('_')[0]
categories.setdefault(start_with, []).append(filename)
print(categories)
# {'pear': ['pear_20171102_report2.csv', 'pear_20171101_report2.csv'], 'apple': ['apple_20171027_report2.csv', 'apple_20171101_report2.csv', 'apple_20171102_report2.csv'], 'orange': ['orange_20171101_report2.csv'], 'kiwi': ['kiwi 20171102 report2.csv', 'kiwi 20171101 report2.csv'], 'cherry': ['cherry 20171101 report2.csv', 'cherry 20171031_report2.csv'], 'mango': ['mango 20171001 report2.csv']}
对于每个类别,您现在都有可以按ctime
排序的列表。