我有雨量测量和水位测量的数据。但具有不同的日期和时间值。让我们说我想通过在一个子图中完全同时显示数据来比较数据。我试图用不同的数据帧来完成它,如图所示: Rain and water level measurements
如图所示,时间在两个数字中都发生了变化,因此很难比较这些峰值"根据同一时间。
有没有办法比较它使用Pandas DataFrame?我自己尝试过,使用以下代码:
import pandas as pd
import matplotlib.pyplot as plt
import pickle
wb = pickle.load(open("data.p","rb"))
rain_period = wb[0]
flow_knudmose = wb[1]
periods = [['20170224','20170819','20170906'],
['20170308','20170826','20170917']]
# Period 1
rain_1 = rain_period.loc[(rain_period['Time'] >= periods[0][0]) &(rain_period['Time'] <= periods[1][0]) ]
rain_1.sort_values('Time',ascending=True,inplace=True)
water_1 = flow_knudmose.loc[(flow_knudmose['Time'] >= periods[0][0]) & (flow_knudmose['Time'] <= periods[1][0]) ]
water_1.sort_values('Time',ascending=True,inplace=True)
fig,axes = plt.subplots(nrows=2,ncols=1)
rain_1.plot(color='b',ax = axes[0], x='Time')
water_1.plot(color='r',ax = axes[1], x='Time')
plt.show()
这段代码是我附上的图。您可以获得data.p
pickle here
提前致谢!
答案 0 :(得分:0)
所以你有两个表之间不匹配的时间数据,你想要的是一个&#34;交叉点&#34;两个时间数据集。 从任一组中丢弃时间数据并创建新的,共同的开始和结束时间:
startTime = water_1.iloc[0]['Time'] if water_1.iloc[0]['Time'] >= rain_1.iloc[0]['Time'] else rain_1.iloc[0]['Time']
endTime = water_1.iloc[-1]['Time'] if water_1.iloc[-1]['Time'] <= rain_1.iloc[-1]['Time'] else rain_1.iloc[-1]['Time']
在这些时间限制内创建新数据集:
rain_2 = rain_1[(rain_1['Time'] >= startTime) & (rain_1['Time'] <= endTime)]
water_2 = water_1[(water_1['Time'] >= startTime) & (water_1['Time'] <= endTime)]
简介:
fig,axes = plt.subplots(nrows=2,ncols=1)
rain_2.plot(color='b',ax = axes[0], x='Time')
water_2.plot(color='r',ax = axes[1], x='Time')
plt.tight_layout()
plt.show()
答案 1 :(得分:0)
我希望您能找到以下代码和注释:
import pandas as pd
import matplotlib.pyplot as plt
import pickle
wb = pickle.load(open("data.pickle", "rb"))
rain_period = wb[0]
flow_knudmose = wb[1]
periods = [['20170224','20170819','20170906'],
['20170308','20170826','20170917']]
# <dataframe>.copy() are added to avoid a warning about modifying dataframe's view
# As described at: https://stackoverflow.com/questions/17328655/pandas-set-datetimeindex,
# we can use DatetimeIndex for a new index; old 'Time' column can be dropped afterwards
rain_1 = rain_period.loc[(rain_period['Time'] >= periods[0][0]) & (rain_period['Time'] <= periods[1][0])].copy()
rain_1 = rain_1.set_index(pd.DatetimeIndex(rain_1['Time'])).drop(columns=["Time"]).sort_index()
water_1 = flow_knudmose.loc[(flow_knudmose['Time'] >= periods[0][0]) & (flow_knudmose['Time'] <= periods[1][0])].copy()
water_1 = water_1.set_index(pd.DatetimeIndex(water_1['Time'])).drop(columns=["Time"]).sort_index()
# With sharex=True, the plots show the entire period of time represented by the data in the dataframes,
# rather than the intersection of time periods (in the case with intersection, some important data might not be shown)
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True)
# Without <index>.to_pydatetime(), this code produces an error:
# "AttributeError: 'numpy.datetime64' object has no attribute 'toordinal'"
axes[0].plot_date(rain_1.index.to_pydatetime(), rain_1["Rain"], '-',
color='b', label="Rain");
axes[1].plot_date(water_1.index.to_pydatetime(), water_1["Water Level"], '-',
color='r', label="Water Level");
# Set the favorite angle for x-labels and show legends
for ax in axes:
plt.sca(ax)
plt.xticks(rotation=45)
ax.legend(loc="upper right")
plt.show()
输出: produced plot
建议使用to_pydatetime()
进行转换:
Converting pandas DatetimeIndex to 'float days format' with Matplotlib.dates.datestr2num
此解决方案适用于:
python 3.5.4
pandas 0.21.0
matplotlib 2.1.0