Question

如果这太简单了，我很抱歉，但我已经搜索了分配，无法找到解决此问题的方法。

我正在填充我的数据框（df），如下所示：

weather = pd.read_csv(weather_path)
weather_stn1 = weather[weather['Station'] == 1][['Tavg']]
weather_stn2 = weather[weather['Station'] == 2][['Tavg']]

df = pd.DataFrame(columns=['xAxis', 'yAxis1', 'yAxis2'])
df['xAxis'] = pd.to_datetime(weather['Date'])
df['yAxis1'] = weather_stn1['Tavg']
df['yAxis2'] = weather_stn2['Tavg']

我的数据框如下：

     xAxis        yAxis1  yAxis2
0   2009-05-01      53     NaN
1   2009-05-01     NaN      55
2   2009-05-02      55     NaN
3   2009-05-02     NaN      55
4   2009-05-03      57     NaN
5   2009-05-03     NaN      58

但我希望得到如下结果：

     xAxis       yAxis1  yAxis2
0   2009-05-01      53     55
2   2009-05-02      55     55
4   2009-05-03      57     58

我一直在研究weather_stn1和weather_stn2的重建索引以及应用group by但是它没有像我想要的那样工作。最终我没有任何东西可以展示！

我该如何解决这个问题？

感谢您提前分配。

Answer 1

伙计我自己找到了解决方案，以防其他人被卡住，这会有所帮助。

df = pd.DataFrame(columns=['xAxis', 'yAxis1', 'yAxis2'])
df['xAxis'] = pd.to_datetime(weather['Date'])
df['yAxis1'] = weather_stn1['Tavg']
df['yAxis2'] = weather_stn2['Tavg']

plot_df = plot_df.groupby(plot_df['xAxis']).mean()

print plot_df.reset_index()

现在我的输出为：

         xAxis  yAxis1  yAxis2
0   2009-05-01      53      55
1   2009-05-02      55      55
2   2009-05-03      57      58
3   2009-05-04      57      60
4   2009-05-05      60      62
5   2009-05-06      63      66

这很简单！

Answer 2

您真正想要做的是转动表格，使station列中的值成为列标题。试试这个：

df = weather.pivot(index='Date', columns='Station', values='Tavg')

如果每个日期每个日期的记录不超过一条，那么除了日期将是索引而不是列之外，您将得到您想要的内容。如果愿意，可以重置索引并更改列名。

如何在数据框中应用group by而忽略了Pandas中的NaN值？

2 个答案: