Question

我有pandas数据框网址，如

location  dom_category
3         'edu'
3         'gov'
3         'edu'
4         'org'
4         'others'
4         'org'

我希望这个数据框像

location  edu   gov   org   others
3         2     1     0     0
4         0     0     2     1

edu，gov，org和其他人包含特定位置的计数。我有正确的代码，但我知道它不是优化的

url['val']=1
url_final=url.pivot_table(index=['location'],values='val',columns=
['dom_category'],aggfunc=np.sum)

Answer 1

首先，如有必要，请str.strip删除'。

然后使用groupby汇总size并重新塑造unstack：

df['dom_category'] = df['dom_category'].str.strip("\'")
df = df.groupby(['location','dom_category']).size().unstack(fill_value=0)
print (df)
dom_category  edu  gov  org  others
location                           
3               2    1    0       0
4               0    0    2       1

或使用pivot_table：

df['dom_category'] = df['dom_category'].str.strip("\'")
df=df.pivot_table(index='location',columns='dom_category',aggfunc='size', fill_value=0)
print (df)
dom_category  edu  gov  org  others
location                           
3               2    1    0       0
4               0    0    2       1

最后可以将索引转换为列，并按reset_index + rename_axis删除列名dom_category：

df = df.reset_index().rename_axis(None, axis=1)
print (df)
   location  edu  gov  org  others
0         3    2    1    0       0
1         4    0    0    2       1

Answer 2

使用groupby和value_counts

众议院
摆脱 '

df.dom_category = df.dom_category.str.strip("'")

其余解决方案

df.groupby('location').dom_category.value_counts().unstack(fill_value=0)

dom_category  edu  gov  org  others
location                           
3               2    1    0       0
4               0    0    2       1

要使格式恰到好处

df.groupby('location').dom_category.value_counts().unstack(fill_value=0) \ 
  .reset_index().rename_axis(None, 1)

   location  edu  gov  org  others
0         3    2    1    0       0
1         4    0    0    2       1

Answer 3

让我们使用str.strip，get_dummies和groupby：

df['dom_category'] = df.dom_category.str.strip("\'")
df.assign(**df.dom_category.str.get_dummies()).groupby('location').sum().reset_index()

输出：

   location  edu  gov  org  others
0         3    2    1    0       0
1         4    0    0    2       1

pivot_table with group且without value field

3 个答案: