将DataFrame与不同的日期列进行比较

时间:2018-02-05 19:09:14

标签: python pandas dataframe data-visualization

我有雨量测量和水位测量的数据。但具有不同的日期和时间值。让我们说我想通过在一个子图中完全同时显示数据来比较数据。我试图用不同的数据帧来完成它,如图所示: Rain and water level measurements

如图所示,时间在两个数字中都发生了变化,因此很难比较这些峰值"根据同一时间。

有没有办法比较它使用Pandas DataFrame?我自己尝试过,使用以下代码:

import pandas as pd
import matplotlib.pyplot as plt
import pickle

wb = pickle.load(open("data.p","rb"))

rain_period = wb[0]
flow_knudmose = wb[1]


periods = [['20170224','20170819','20170906'],
        ['20170308','20170826','20170917']]

# Period 1
rain_1 = rain_period.loc[(rain_period['Time'] >= periods[0][0]) &(rain_period['Time'] <= periods[1][0]) ]
rain_1.sort_values('Time',ascending=True,inplace=True)

water_1 = flow_knudmose.loc[(flow_knudmose['Time'] >= periods[0][0]) & (flow_knudmose['Time'] <= periods[1][0]) ]
water_1.sort_values('Time',ascending=True,inplace=True)

fig,axes = plt.subplots(nrows=2,ncols=1)
rain_1.plot(color='b',ax = axes[0], x='Time')
water_1.plot(color='r',ax = axes[1], x='Time')
plt.show()

这段代码是我附上的图。您可以获得data.p pickle here

提前致谢!

2 个答案:

答案 0 :(得分:0)

所以你有两个表之间不匹配的时间数据,你想要的是一个&#34;交叉点&#34;两个时间数据集。 从任一组中丢弃时间数据并创建新的,共同的开始和结束时间:

startTime = water_1.iloc[0]['Time'] if water_1.iloc[0]['Time'] >= rain_1.iloc[0]['Time'] else rain_1.iloc[0]['Time']
endTime   = water_1.iloc[-1]['Time'] if water_1.iloc[-1]['Time'] <= rain_1.iloc[-1]['Time'] else rain_1.iloc[-1]['Time']

在这些时间限制内创建新数据集:

rain_2 = rain_1[(rain_1['Time'] >= startTime) & (rain_1['Time'] <= endTime)]
water_2 = water_1[(water_1['Time'] >= startTime) & (water_1['Time'] <= endTime)]

简介:

fig,axes = plt.subplots(nrows=2,ncols=1)
rain_2.plot(color='b',ax = axes[0], x='Time')
water_2.plot(color='r',ax = axes[1], x='Time')
plt.tight_layout()
plt.show()

答案 1 :(得分:0)

我希望您能找到以下代码和注释:

import pandas as pd
import matplotlib.pyplot as plt
import pickle

wb = pickle.load(open("data.pickle", "rb"))

rain_period = wb[0]
flow_knudmose = wb[1]

periods = [['20170224','20170819','20170906'],
        ['20170308','20170826','20170917']]

# <dataframe>.copy() are added to avoid a warning about modifying dataframe's view 
# As described at: https://stackoverflow.com/questions/17328655/pandas-set-datetimeindex,
# we can use DatetimeIndex for a new index; old 'Time' column can be dropped afterwards
rain_1 = rain_period.loc[(rain_period['Time'] >= periods[0][0]) & (rain_period['Time'] <= periods[1][0])].copy()
rain_1 = rain_1.set_index(pd.DatetimeIndex(rain_1['Time'])).drop(columns=["Time"]).sort_index()

water_1 = flow_knudmose.loc[(flow_knudmose['Time'] >= periods[0][0]) & (flow_knudmose['Time'] <= periods[1][0])].copy()
water_1 = water_1.set_index(pd.DatetimeIndex(water_1['Time'])).drop(columns=["Time"]).sort_index()

# With sharex=True, the plots show the entire period of time represented by the data in the dataframes,
# rather than the intersection of time periods (in the case with intersection, some important data might not be shown)
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True)

# Without <index>.to_pydatetime(), this code produces an error:  
# "AttributeError: 'numpy.datetime64' object has no attribute 'toordinal'"
axes[0].plot_date(rain_1.index.to_pydatetime(), rain_1["Rain"], '-',
                 color='b', label="Rain");
axes[1].plot_date(water_1.index.to_pydatetime(), water_1["Water Level"], '-',
                  color='r', label="Water Level");

# Set the favorite angle for x-labels and show legends
for ax in axes:
    plt.sca(ax)
    plt.xticks(rotation=45)
    ax.legend(loc="upper right")

plt.show()

输出: produced plot

建议使用to_pydatetime()进行转换: Converting pandas DatetimeIndex to 'float days format' with Matplotlib.dates.datestr2num

此解决方案适用于:

python 3.5.4 
pandas 0.21.0
matplotlib 2.1.0