Question

我的数据框df包含测量日期和测量结果（duration，km）

df
Out[20]: 
                          Date duration km
0   2015-03-28 09:07:00.800001    0      0
1   2015-03-28 09:36:01.819998    1      2
2   2015-03-30 09:36:06.839997    1      3
3   2015-03-30 09:37:27.659997    nan    5
4   2015-04-22 09:51:40.440003    3      7
5   2015-04-23 10:15:25.080002    0      nan

如何计算每天的平均持续时间和公里数？我想使用groupby和日期......来取行的平均值。

Answer 1

我认为你需要resample：

<base href="/">

cols = df.columns.difference(['Date'])
#if possible convert to float
df[cols] = df[cols].astype(float)

#if astype failed, because non numeric data, convert them to NaNs
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')

#if mixed dtypes
df[cols] = df[cols].astype(str).astype(float)
#alternatively 
#df[cols] = df[cols].astype(str).apply(pd.to_numeric, errors='coerce')

或者：

df = df.resample('d', on='Date').mean().dropna(how='all')
print (df)
            duration   km
Date                     
2015-03-28       0.5  1.0
2015-03-30       1.5  4.0
2015-04-22       3.0  7.0
2015-04-23       0.0  0.0

Answer 2

使用groupby

In [896]: df.groupby(df.Date.dt.date).mean()
Out[896]:
            duration   km
Date
2015-03-28       0.5  1.0
2015-03-30       1.5  4.0
2015-04-22       3.0  7.0
2015-04-23       0.0  0.0

在同一天大熊猫中获取数据的平均值

2 个答案: