以下代码
test_df['Started'] = pd.to_datetime(test_df['Started'])
test_df['day count'] = test_df['Started'].apply(lambda x: x.strftime('%A'))
test_day_count = test_df['day count'].value_counts()
print(test_day_count)
返回
Thursday 25
Friday 19
Saturday 13
这些值是当天开始的测试次数。我想找到一周中每一天的mean()测试分数。
我尝试将平均值添加到第三行以及等级位于[]
中的列的名称test_df['Started'] = pd.to_datetime(test_df['Started'])
test_df['day count'] = test_df['Started'].apply(lambda x:x.strftime('%A'))
test_day_count = test_df['day count'].value_counts().mean(test_df['marks'])
print(test_day_count)
我收到错误 TypeError:'Series'对象是可变的,因此它们不能被散列
答案 0 :(得分:1)
将strftime('%A')
用作groupby
参数:
icma_df.marks.groupby(icma_df['Started'].dt.strftime('%A')).mean()
演示
icma_df = pd.DataFrame(dict(marks=np.random.rand(100),
Started=pd.date_range('2012-12-31', periods=100, freq='B')))
icma_df.marks.groupby(icma_df['Started'].dt.strftime('%A')).mean()
正如@root所指出的,这也有效,看起来更好,可能更快
icma_df.marks.groupby(icma_df['Started'].dt.weekday_name).mean()
Started
Friday 0.428581
Monday 0.443394
Thursday 0.485658
Tuesday 0.325027
Wednesday 0.506592
Name: marks, dtype: float64