我在一个位置有大约100个csv文件(此刻,明天会更多),每天更新24-40个新文件。那么,从过去一天导入文件的最佳方法是什么,但另外一种方法是我需要放置文件名:
data = pd.read_csv('/data/testingfile-PM_18707-2017_06_14-05_03_23__382.csv', delimiter = ';', low_memory=False)
data1 = pd.read_csv('/data/testingfile--PM_18707-2017_06_14-06_30_56__131.csv', delimiter = ';', low_memory=False)
是否可以编写一些时间戳识别功能?
from datetime import time
from datetime import date
from datetime import datetime
import fnmatch
def get_local_file(date, hour, path='data/'):
"""Get date+hour processing file from local drive
:param date: str Processing date
:param hour: str Processing hour
:param path: str Path to file location
:return: Pandas DF Retrieved DataFrame
"""
hour = [time(i).strftime(%H) for i in range(24)]
sdate = date.replace('-', '_') + "-" + str(hour)
for p_file in os.listdir(path):
if fnmatch.fnmatch(p_file, 'testingfile-PM*'+sdate+'*.csv'):
return pd.read_csv(path+p_file, delimiter=';')
我发现了类似的东西,但我无法让它发挥作用。
答案 0 :(得分:2)
如果您正在寻找从csv文件名中提取日期的方法,那么请查看蟒蛇' datetime
模块(或strptime方法,准确无误)。它允许您将字符串解析为日期时间,如下所示:
from datetime import datetime
name = "data/testingfile-PM_18707-2017_06_14-05_03_23__382.csv"
datepart = name.strip("data/testingfile-PM_18707-").split("__")[0] #quick and dirty parsing method that satisfies the given two examples.
date = datetime.strptime(datepart,"%Y_%m_%d-%H_%M_%S")
print(datepart)
print(date)
2017_06_14-05_03_23
2017-06-14 05:03:23
因此,如果您想有选择地只打开1天的csvs,您可以这样做:
import glob
from datetime import datetime
now = datetime.now()
for csv in glob.glob("data/*.csv"):
datepart = csv.strip("data/testingfile-PM_18707-").split("__")[0]
date = datetime.strptime(datepart, "%Y_%m_%d-%H_%M_%S")
if (now - date).total_seconds() < 3600*24:
pd.read_csv(csv)
else:
print("Too old to care!")
请注意,这与Pandas本身无关。