Question

我最初在数据科学界询问a question：

我的表格格式如下表所示：
Feature amount    ID  
Feat1    2        1   
Feat2    0        1   
Feat3    0        1   
Feat4    1        1   
Feat2    2        2   
Feat4    0        2   
Feat3    0        2   
Feat6    1        2 
我们说我有200个不同的ID。我想转换所有不同的功能变成变量和数量到观察，所以我结合具有相同ID的行到一行中。例如，
Feat1 Feat2 Feat3 Feat4 Feat5 Feat6 ID 
  2     0     0     1    NA    NA   1    
 NA     2     0     0    NA    1    2    
有没有一种很好的方法可以在Python（pandas）或R？
中完成

这就是我得到的答案：

newdata = pd.DataFrame(columns=['ID', 'Location', 'Feat1', 'Feat2', 'Feat3', 'Feat4', 'Feat5', 'Feat6'])
grouped = data.groupby(['ID', 'Location'])

for index, (group_name, d) in enumerate(grouped):
    newdata.loc[index, 'ID'] = group_name[0]
    newdata.loc[index, 'Location'] = group_name[1]
    for feature, amount in zip(d['Feature'], d['amount']):
        newdata.loc[index, feature] = amount

经过更多Google搜索，我发现这个question的答案是：

因此，请尽量避免使用Python loop for i, row in enumerate(...)

我想知道，关于我原来的问题，是否有更有效的方法？

Answer 1

我相信这就是你所追求的目标。

>>> df.pivot_table(values='amount', index='ID', columns='Feature')
Feature  Feat1  Feat2  Feat3  Feat4  Feat6
ID                                        
1            2      0      0      1    NaN
2          NaN      2      0      0      1

根据您的数据和需求，有不同的变化。例如：

>>> df.pivot_table(values='amount', index='ID', columns='Feature', 
                   aggfunc=np.sum, fill_value=0)
Feature  Feat1  Feat2  Feat3  Feat4  Feat6
ID                                        
1            2      0      0      1      0
2            0      2      0      0      1

使用pandas循环的性能

1 个答案: