Question

作为熊猫的新手，我正在努力应对数据安排问题。

我从pandas数据框中的日志文件中获得了大量数据，其结构如下：

day   user   measure1   measure2   ...
1     u1     xxxxx      yyyyy      ...
1     u2     xxxxx      yyyyy      ...
1     u3     xxxxx      yyyyy      ...
2     u2     xxxxx      yyyyy      ...
2     u4     xxxxx      yyyyy      ...
2     u3     xxxxx      yyyyy      ...
3     u1     xxxxx      yyyyy      ...
3     u3     xxxxx      yyyyy      ...
...   ...    ...        ...        ...

因此，并非每个用户每天都会出现，而数据既不按天也不按用户排序。但是，如果发生了条目，则具有所有措施。

现在我需要重新排列这些数据以获得2D表格＃34;每个用户＆＃34;与＃34;每天＆＃34;对于每个度量并用零填充间隙，例如

For measure1:                      For measure2:
      u1     u2     u3     u4            u1     u2     u3     u4
1  xxxxx  xxxxx  xxxxx      0      1  yyyyy  yyyyy  yyyyy      0  
2      0  xxxxx  xxxxx  xxxxx      2      0  yyyyy  yyyyy  yyyyy  
3  xxxxx      0  xxxxx      0      3  yyyyy      0  yyyyy      0

如何在python3中使用pandas做到这一点？我也对替代解决方案持开放态度，例如使用numpy而不是pandas。

到目前为止，我设法提取数据集中所有出现的用户和天数的数组，但不知道如何巧妙地分配测量数据。

我很感激在这件事上提供任何帮助。

Answer 1

您似乎想要一个多索引数据框（index1：day，index2：measure）

棘手的部分是您可能需要在这些操作之前转置数据帧。请查看此问题的答案，该问题与您的Constructing 3D Pandas DataFrame

类似

希望有所帮助

Answer 2

您需要set_index和unstack

df.set_index(['day','user']).measure1.unstack(fill_value=0)
Out[6]: 
user     u1     u2     u3     u4
day                             
1     xxxxx  xxxxx  xxxxx      0
2         0  xxxxx  xxxxx  xxxxx
3     xxxxx      0  xxxxx      0
df.set_index(['day','user']).measure2.unstack(fill_value=0)
Out[7]: 
user     u1     u2     u3     u4
day                             
1     yyyyy  yyyyy  yyyyy      0
2         0  yyyyy  yyyyy  yyyyy
3     yyyyy      0  yyyyy      0

使用python pandas数据帧重新排列连续数据日志

2 个答案: