Question

我有一个像这样的csv文件：

word, tag, counter
I, Subject, 1
Love, Verb, 3
Love, Adjective, 1

我想创建一个数据框，其中列是单词和标签列表，如下所示：

Word Subject  Verb  Adjective
I     1        0     0
Love  0        3     1

如何使用pandas做到这一点？

Answer 1

您可以使用pivot：

df = df.pivot(index='word', columns='tag', values='counter').fillna(0).astype(int)
print (df)
tag   Adjective  Subject  Verb
word                          
I             0        1     0
Love          1        0     3

set_index和unstack的另一种解决方案：

df = df.set_index(['word','tag'])['counter'].unstack(fill_value=0)
print (df)
tag   Adjective  Subject  Verb
word                          
I             0        1     0
Love          1        0     3

但如果得到：

ValueError：索引包含重复的条目，无法重塑

然后需要pivot_table中的某些aggfunc聚合：

print (df)
   word        tag  counter
0     I    Subject        1
1  Love       Verb        3
2  Love  Adjective        1 <-duplicates for Love and Adjective
3  Love  Adjective        3 <-duplicates for Love and Adjective

df = df.pivot_table(index='word', 
                    columns='tag', 
                    values='counter', 
                    aggfunc='mean', 
                    fill_value=0)
print (df)
tag   Adjective  Subject  Verb
word                          
I             0        1     0
Love          2        0     3

groupby和unstack的另一种解决方案：

df = df.groupby(['word','tag'])['counter'].mean().unstack(fill_value=0)
print (df)
tag   Adjective  Subject  Verb
word                          
I             0        1     0
Love          2        0     3

更新pandas数据帧并在数据存在时更新值

1 个答案: