只有在数据介于两个日期之间时,才有办法将时间序列数据导入.csv吗?
下面的代码可以导入一系列.csv文件中的所有数据,但是可以在两个日期之间导入吗?
def getTimeseriesData(DataPath, startDate, endDate):
colNames = ['date']
path = DataPath
filePath = path, "*.csv"
allfiles = glob.glob(os.path.join(path, "*.csv"))
for fname in allfiles:
name = os.path.splitext(fname)[0]
name = os.path.split(name)[1]
colNames.append(name)
print(colNames)
dataframes = [pd.read_csv(fname, header=None) for fname in allfiles]
reduce(partial(pd.merge, on=0, how='outer'), dataframes)
timeseriesData = reduce(partial(pd.merge, on=0, how='outer'), dataframes)
timeseriesData.columns=colNames
return timeseriesData
print(type(timeseriesData))
答案 0 :(得分:0)
import glob
def getTimeseriesData(data_path, start_date, end_date):
dfs = []
for f_name in glob.glob(os.path.join(data_path, "*.csv")):
df = pd.read_csv(f_name, header=None)
# Date filter (assumes filter column is named 'date').
dfs.append(df.loc[(df['date'] >= start_date) & (df['date'] <= end_date), :])
dfs = pd.concat(dfs)
return dfs
答案 1 :(得分:0)
我会给你一个通用答案。
首先,您的日期应保持日期时间格式。如果您以“day.month.year”或“day-month-year”等格式从Excel导入,我会使用此函数返回日期时间
def to_date(date, split_sign):
date = date.split(split_sign)
day = date[0].replace(split_sign, ' ')
month = date[1].replace(split_sign, ' ')
if len(date[2].replace(split_sign, ' ')) < 4:
year = '20' + date[2].replace(split_sign, ' ')
else:
year = date[2].replace(split_sign, ' ')
date = str(day + month + year)
return datetime.datetime.strptime(date, '%d%m%Y').date()
Pandas有一个函数pandas.to_datetime
可以将日期转换为日期时间,但对我来说它并不总是有效。
然后选择类似,可以插入日期,如[日,月,年]
def filter_df(df, date_from, date_to):
date1 = datetime.datetime(date_from[2], date_from[1], date_from[0])
date2 = datetime.datetime(date_to[2], date_to[1], date_to[0])
return df[(df['date']>=date1) & (df['date']<=date2)]