下面的代码读取文件,将它们保存在数据帧中,然后将所有文件连接起来,并在连接后每秒重新采样数据。由于这太难记忆了。我想要的是一步一步地做。例如,我读了两个文件,将它们连接起来,然后对其重新采样。然后读取下一个文件,将其与前两个文件的结果连接起来,并重新采样,依此类推,以10个文件为单位。如何更改代码。有人可以帮我吗?以下是我的代码
import pandas as pd
import os
#import matplotlib.pyplot as plt
#df1 = pd.read_hdf("E:\examples\hdf files\conew1.h5", 'df')
#df2 = pd.read_hdf("E:\examples\hdf files\conew2.h5", 'df')
#df3 = pd.read_hdf("E:\examples\hdf files\conew3.h5", 'df')
hdfdirectory = "E:\examples\hdf files"
number_of_dfs=1
df=None
for fi in os.listdir(hdfdirectory):
hdfpath = os.path.join(hdfdirectory, fi)
print hdfpath
df1 = pd.read_hdf(hdfpath, 'df')
for i in range(number_of_dfs):
if df is None:
df=pd.DataFrame({'timestamp':df1.timestamp , 'url' : df1.url})
dft = df.set_index('timestamp').resample('S').count()
else:
temp=pd.DataFrame({'timestamp':df1.timestamp , 'url' :df1.url})
tempt = temp.set_index('timestamp').resample('S').count()
df=pd.concat([dft,tempt])
答案 0 :(得分:0)
Tried to create an example to illustrate my point. You might have to tweak a little but will get an idea
hdfdirectory = "E:\examples\hdf files"
df=None
for fi in os.listdir(hdfdirectory):
hdfpath = os.path.join(hdfdirectory, fi)
print hdfpath
df1 = pd.read_hdf(hdfpath, 'df')
if df is None:
df=pd.DataFrame({'timestamp':df1.timestamp , 'url' : df1.url})
dft = df.set_index('timestamp').resample('S').count()
df=dft
else:
temp=pd.DataFrame({'timestamp':df1.timestamp , 'url' :df1.url})
tempt = temp.set_index('timestamp').resample('S').count()
df=pd.concat([df,tempt])