我在尝试对齐两个不同的熊猫数据框时遇到问题。 实际上,时间对齐可以使用:
import pandas as pd
import datetime
import numpy as np
import matplotlib.pyplot as plt
d1 = np.random.random_integers(0,7000,[4000,1])
d2 = np.random.random_integers(0,7000,[2000,1])
dfA = pd.DataFrame(d1)
dfB = pd.DataFrame(d2)
dfA.columns = ['data1']
dfB.columns = ['data2']
dfA['time'] = pd.date_range('1970-01-01 00:01:00', periods=dfA.shape[0], freq='1S')
dfB['time'] = pd.date_range('1970-01-01 00:00:00', periods=dfB.shape[0], freq='1S')
dfA.set_index('time', inplace=True)
dfB.set_index('time', inplace=True)
dfA1 = dfA.between_time('00:00:00', '00:09:00')
dfA2 = dfA.between_time('00:14:00', '00:16:00')
dfB1 = dfB.between_time('00:00:00', '00:12:00')
dfB2 = dfB.between_time('00:15:00', '00:16:00')
df1 = pd.concat([dfA1, dfA2])
df2 = pd.concat([dfB1, dfB2])
df_aligned = df1.join(df2, how='outer').interpolate(method='time').resample('2S').mean().fillna(method='backfill')
print(df_aligned.head())
df_aligned.plot()
plt.plot(df_aligned['data1'].values)
plt.plot(df_aligned['data2'].values)
plt.show()
但是df1和df2中的两列表示不同的时间间隔,因此,我在此间隔内有新的样本。 我的任务是仅检索实际数据,而不会从间隙中获得假样本。
有什么建议吗? 提前非常感谢您。
答案 0 :(得分:0)
我找到了解决方法:
1)首先,删除interpolate()并在fillna()中添加一个limit = 1。这允许长时间爆发的NaN值保留在数据间隙中。当然,您可以根据任务使用fillna方法和自定义限制。
df_align = df1.join(df2, how='outer').resample('2S').mean().fillna(method='backfill', limit=1)
2)然后,使用dropna()删除所有NaN值(即时间间隔内的值)
df_align = df_align.dropna()
最终结果:
import pandas as pd
import datetime
import numpy as np
import matplotlib.pyplot as plt
d1 = np.random.random_integers(0,7000,[4000,1])
d2 = np.random.random_integers(0,7000,[2000,1])
dfA = pd.DataFrame(d1)
dfB = pd.DataFrame(d2)
dfA.columns = ['data1']
dfB.columns = ['data2']
dfA['time'] = pd.date_range('1970-01-01 00:01:00', periods=dfA.shape[0], freq='1S')
dfB['time'] = pd.date_range('1970-01-01 00:00:00', periods=dfB.shape[0], freq='1S')
dfA.set_index('time', inplace=True)
dfB.set_index('time', inplace=True)
dfA1 = dfA.between_time('00:00:00', '00:09:00')
dfA2 = dfA.between_time('00:14:00', '00:16:00')
dfB1 = dfB.between_time('00:00:00', '00:12:00')
dfB2 = dfB.between_time('00:15:00', '00:16:00')
df1 = pd.concat([dfA1, dfA2])
df2 = pd.concat([dfB1, dfB2])
df_aligned = df1.join(df2, how='outer').resample('2S').mean().fillna(method='backfill', limit=1)
df_align = df_align.dropna()
print(df_aligned.head())
df_aligned.plot()
plt.plot(df_aligned['data1'].values)
plt.plot(df_aligned['data2'].values)
plt.show()