Question

我有一个用户表，每天花多少钱。我想将每个用户重新排列成一行，其中列表示他们每天花费的金额。

user_id   day    spending

111       mon      15
111       tues     20
111       weds     25
....
122       mon      44
122       tues     34
122       weds     90
122       thurs     26
....

我想将表格折叠为表格

id     mon tues weds thurs fri sat sun    

111    15  20   25   16    48  32  40
122    44  34   90   26    20  22  53

现在这段代码将每日列（mon，tues，weds等）初始化为全零，然后将每日消费数据放入for循环中的每一列，除了那天的消费外是零，这导致对角矩阵看表。然后我总结了整个事情，以便在单行记录中填充所有值。现在这个代码适用于一个小数据集，但for循环在我的完整数据集上需要很长时间：

for i,hr in zip(np.arange(len(df)),df['day']):
     df.ix[i,hr]=df1_subset.ix[i,"spending"]
# aggregate the users by id and dates
df = df.groupby('id').sum()

有什么方法可以使用更多的熊猫适当的操作，我可以避免使用for循环或使其更快？

感谢。

Answer 1

df.pivot(index='user_id', columns='day').fillna(0)
Out[50]: 
        spending                
day          mon thurs tues weds
user_id                         
111           15     0   20   25
122           44    26   34   90

或者，如果您想要自定义聚合功能，请使用pivot_table：

table = pd.pivot_table(df, index='user_id', columns='day', aggfunc=np.sum)

table
Out[53]: 
        spending                
day          mon thurs tues weds
user_id                         
111           15   NaN   20   25
122           44    26   34   90

Answer 2

您可以使用DataFrame.pivot来实现此目的。如果将表存储在名为df的数据框中，则代码将为

Table = df.pivot(index='userid',columns='day',values='spending')

将多个pandas行组合成不同标题的单个记录

2 个答案: