我具有以下结构的数据,其中包含时间,类别,活动指标和数值。
输入
i time cat. active item_count
0 00:00:00 X TRUE 2
1 00:00:06 X FALSE 4
2 00:00:08 X TRUE 13
3 00:00:25 Y FALSE 11
4 00:01:10 Y TRUE 2
5 00:01:58 Y TRUE 6
6 00:02:53 Y TRUE 2
7 07:40:29 X FALSE 1
8 08:34:52 X FALSE 2
9 11:50:48 X TRUE 5
10 11:55:42 X TRUE 3
我想计算类别中每2行的活动项的比率,并复制每2行集中最后一行的时间以获取此输出:
输出
time cat. rate
00:00:06 X 0.33 (2/(2+4))
07:40:29 X 13/14
00:01:10 Y 2/13
00:02:53 Y 8/8
11:50:48 X 5/7
11:55:42 X 3/3
输入中的“集合”将是类别X和[[3,4],[5]的行[[0,1],[2,7],[8,9],[10]] ,6]]类别Y。
我该如何设置?按类别排序,然后按时间排序,然后每N个项目逐步执行一次?我在寻找解决方案时找到了GroupBy.nth,但不确定是否适用于此。
答案 0 :(得分:2)
首先使用cumcount
创建助手structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), col1 = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 3L, 6L,
1L, 1L, 1L), .Label = c("bird", "dog", "donkey", "frog", "horse",
"zebra"), class = "factor"), col2 = c(40L, 40L, 40L, 40L, 40L,
40L, 40L, 85L, 89L, 89L, 82L, 89L, 81L, 89L, 87L, 76L, 67L, 54L,
56L, 34L), name = structure(c(2L, 2L, 2L, 2L, 1L, 2L, 2L, 3L,
7L, 7L, 7L, 1L, 1L, 3L, 4L, 5L, 6L, 8L, 1L, 1L), .Label = c("",
"canidae", "dendrobatidae", "equidae1", "equidae2", "equidae3",
"leptodactylidae", "psittacidae"), class = "factor")), class = "data.frame", row.names = c(NA,
-20L))
,传递给另一个 listView = findViewById<ListView>(R.id.part_list)
,然后使用Series
聚合lambda函数,最后进行一些数据清理-使用{{ 1}}:
对于groupby
列,也只需要对last
个值求和,并从右侧除以reset_index
和所有值的rename
。
rate
True
答案 1 :(得分:1)
这是一种实现方法,我并不是真正使用pandas提供的工具,但这是(看似)可行的解决方案,直到一个使用pandas工具的人问世。
def rate_dataframe(df):
df_sorted = df.sort_values(['cat.', 'time', 'active'])
prev_row = df_sorted.iloc[0]
cat_count, active_count, not_active_count = 0, 0, 0
ratio_rows = list()
for _, row in df_sorted.iterrows():
if row['active']:
active_count += row['item_count']
else:
not_active_count += row['item_count']
if cat_count == 1 and prev_row['cat.'] == row['cat.']:
ratio = active_count / (active_count + not_active_count)
ratio_rows.append([row['time'], row['cat.'], ratio])
cat_count, active_count, not_active_count = 0, 0, 0
elif cat_count == 0:
cat_count += 1
elif cat_count == 1 and prev_row['cat.'] != row['cat.']:
# handle last row in cat if nbCatRows is odd
if row['active']:
active_count, not_active_count = row['item_count'], 0
else:
active_count, not_active_count = 0, row['item_count']
ratio_rows.append([
prev_row['time'],
prev_row['cat.'],
int(prev_row['active'])
])
prev_row = row
return pd.DataFrame(ratio_rows, columns=['time', 'cat.', 'rate'])