随着时间的推移,我想获取熊猫中几列的平均值, 因此,如果我有此数据:
Time Country Server Load
2011-01-01 00:00:00 USA DNS 50
2011-01-01 00:15:00 USA HTTP 60
2011-01-01 00:37:00 Spain HTTP 20
2011-01-01 01:02:00 Spain DNS 30
2011-01-01 01:11:00 Italy DNS 70
2011-01-01 23:49:00 Italy File 15
2011-01-02 00:00:00 USA File 74
2011-01-02 00:49:00 Italy AD 12
2011-01-02 00:31:00 Italy AD 11
2011-01-02 01:13:00 USA AD 17
2011-01-02 01:19:00 Spain File 18
2011-01-02 23:10:00 Spain HTTP 90
这就是我要输出的
Country 2011-01-01 - Mean 2011-01-02 - Mean
USA 55 45.5
Spain 25 54
Italy 42.5 11.5
...
和服务器
Server 2011-01-01 - Mean 2011-01-02 - Mean
HTTP 40 90
DNS 50 NA
FILE 15 46
AD NA 13.3
答案 0 :(得分:2)
将DataFrame.groupby
与Series.dt.date
一起使用均值,并通过Series.unstack
进行整形:
df1 = df.groupby(['Country', df['Time'].dt.date])['Load'].mean().unstack()
print (df1)
Time 2011-01-01 2011-01-02
Country
Italy 42.5 11.5
Spain 25.0 54.0
USA 55.0 45.5
df2 = df.groupby(['Server', df['Time'].dt.date])['Load'].mean().unstack()
print (df2)
Time 2011-01-01 2011-01-02
Server
AD NaN 13.333333
DNS 50.0 NaN
File 15.0 46.000000
HTTP 40.0 90.000000
答案 1 :(得分:1)
通过pivot_table
访问date元素,将dt.date
与平均日期结合使用:
piv1 = df.pivot_table(index='Country', columns=df['Time'].dt.date, values='Load')
Time 2011-01-01 2011-01-02
Country
Italy 42.5 11.5
Spain 25.0 54.0
USA 55.0 45.5
对于服务器:
piv2 = df.pivot_table(index='Server', columns=df['Time'].dt.date, values='Load')
Time 2011-01-01 2011-01-02
Server
AD NaN 13.333333
DNS 50.0 NaN
File 15.0 46.000000
HTTP 40.0 90.000000