所以,这是我的数据框。
session_id question_difficulty attempt_updated_at
5c822af21c1fba22 2 1557470128000
5c822af21c1fba22 3 1557469685000
5c822af21c1fba22 4 1557470079000
5c822af21c1fba22 5 1557472999000
5c822af21c1fba22 3 1557474145000
5c822af21c1fba22 3 1557474441000
5c822af21c1fba22 4 1557474299000
5c822af21c1fba22 4 1557474738000
5c822af21c1fba22 3 1557475430000
5c822af21c1fba22 4 1557476960000
5c822af21c1fba22 5 1557477458000
5c822af21c1fba22 2 1557478118000
5c822af21c1fba22 5 1557482556000
5c822af21c1fba22 4 1557482809000
5c822af21c1fba22 5 1557482886000
5c822af21c1fba22 5 1557484232000
我想将字段“ attempt_updated_at”(是纪元时间)切成两个相等的bin,并在每个会话的那个bin中找到“ question_difficulty”的平均值。
我想分别存储第一仓和第二仓的均值。
我尝试过pd.cut,但我不知道如何使用它。
我希望我的输出像
例如,
session_id mean1_difficulty mean2_difficulty
5c822af21c1fba22 5.0 3.0
任何想法都值得赞赏, 谢谢。
答案 0 :(得分:2)
我相信您需要qcut
和总计df1 = (df.groupby(['session_id', pd.qcut(df['attempt_updated_at'], 2, labels=False)])
['question_difficulty'].mean()
.unstack()
.rename(columns=lambda x: f'mean{x+1}_difficulty'))
print (df1)
attempt_updated_at mean1_difficulty mean2_difficulty
session_id
5c822af21c1fba22 3.5 4.125
:
df1 = (df.groupby(['session_id', pd.cut(df['attempt_updated_at'], 2, labels=False)])
['question_difficulty'].mean()
.unstack()
.rename(columns=lambda x: f'mean{x+1}_difficulty'))
print (df1)
attempt_updated_at mean1_difficulty mean2_difficulty
session_id
5c822af21c1fba22 3.444444 4.285714
或cut
:
try {
const result = await axios.post(`YOUR_URL`, {<Your JSON payload>});
} catch (error) {
console.error(error);
}
函数之间的差异可以更好地解释here。
答案 1 :(得分:1)
我认为应该这样做:
pdf.sort_values('attempt_updated_at', ascending=False, inplace=True).reset_index(drop=True)
first = pdf.iloc[:pdf.shape[0] // 2]
second = pdf.iloc[pdf.shape[0] // 2:]
res = pd.DataFrame(first.groupby('session_id')['question_difficulty'].agg('mean')) \
.rename(columns={'question_difficulty': 'mean1_difficulty'}) \
.join(second.groupby('session_id')['question_difficulty'].agg('mean')) \
.rename(columns={'question_difficulty': 'mean2_difficulty'})