我有一个Pandas df(见下文),我想根据索引列对值进行求和。我的索引列包含字符串值。请参阅下面的示例,这里我尝试将移动,播放和使用电话一起添加为"活动时间"并将它们的相应值相加,同时保留其他索引值,因为这些值已经存在。任何建议,我如何使用这种类型的场景?
**Activity AverageTime**
Moving 0.000804367
Playing 0.001191772
Stationary 0.320701558
Using Phone 0.594305473
Unknown 0.060697612
Idle 0.022299218
答案 0 :(得分:2)
我确信必须有一种更简单的方法,但这是一种可能的解决方案。
# Filters for active and inactive rows
active_row_names = ['Moving','Playing','Using Phone']
active_filter = [row in active_row_names for row in df.index]
inactive_filter = [not row for row in active_filter]
active = df.loc[active_filter].sum() # Sum of 'active' rows as a Series
active = pd.DataFrame(active).transpose() # as a dataframe, and fix orientation
active.index=["active"] # Assign new index name
# Keep the inactive rows as they are, and replace the active rows with the
# newly defined row that is the sum of the previous active rows.
df = df.loc[inactive_filter].append(active, ignore_index=False)
<强>输出强>
Activity AverageTime
Stationary 0.320702
Unknown 0.060698
Idle 0.022299
active 0.596302
即使数据帧中只存在活动行名称的子集,这也会起作用。
答案 1 :(得分:0)
我会添加一个名为“active”的新布尔列,然后添加groupby
列:
df['active']=False
df['active'][['Moving','Playing','Using Phone']] = True
df.groupby('active').AverageTime.sum()