我目前有一个带有架构的宽行pandas数据帧
idx, user, task_0, task_1, ... , task_n, task_result_0, task_result_1 ..., task_result_n, some_other_attribute_0, some_other_attribute_1, ..., some_other_attribute_n
每个用户都以随机顺序获得n个任务。例如
0, Bob, building-task, writing-task, reading-task, building-result, ...
1, Alice, writing-task, building-task, reading-task, writing-result, ...
每个attribute_n都相互关联。例如,task_0中的信息与task_result_0相关。
我想重新排序数据框以命令任务。所以所有行看起来像:
0, Bob, building-task, writing-task, reading-task, building-result, ...
1, Alice, building-task, writing-task, reading-task, building-result, ...
我完全不知道如何解决这个问题。
答案 0 :(得分:1)
在每行内以及任务和结果中排序。
d1 = df.sort_index(1)
d1[['idx', 'user']] \
.join(d1.filter(regex='task_\d+').apply(sorted, 1)) \
.join(d1.filter(regex='task_result_\d+').apply(sorted, 1))
idx user task_0 task_1 task_result_0 task_result_1
0 0 Bob building-task writing-task building-result writing-result
1 1 Alice building-task writing-task building-result writing-result
额外信用
但是,也许你没有分配相同的任务......
使用pd.value_counts
df.set_index(['idx', 'user']).apply(pd.value_counts, 1)
building-task writing-task building-result writing-result
idx user
0 Bob 1 1 1 1
1 Alice 1 1 1 1