如何使用Python pandas将某些列扩展为行

时间:2017-06-13 18:32:34

标签: python pandas

我有一个包含多对多关系的两列表。

e.g。

Animal     Food
rabbit     grass
rabbit     carrots
rabbit     cabbage
dog        carrots
horse      grass
horse      hay

我想要这样的事情:

Animal     Food1   Food2     Food3 
rabbit     grass   carrots   cabbage  
dog        carrots
horse      grass   hay

或类似的东西:

Animal     Grass     Carrots    Cabbage     Hay
rabbit      True      True       True       False
dog         False     True       False      False
horse       True      False      False      True

我尝试过数据透视表,融合和堆叠,但仍然无法弄清楚如何做到这一点。任何帮助将不胜感激。谢谢!

4 个答案:

答案 0 :(得分:4)

选项1
groupbysize

df.groupby(['Animal', 'Food']).size().unstack(fill_value=0).astype(bool)

Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False
s = df.groupby('Animal').Food.apply(list)
pd.DataFrame(s.values.tolist(), s.index).add_prefix('Food').fillna('')

          Food0    Food1    Food2
Animal                           
dog     carrots                  
horse     grass      hay         
rabbit    grass  carrots  cabbage

选项2
groupbyvalue_counts

df.groupby('Animal').Food.value_counts().unstack(fill_value=0).astype(bool)

Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

选项3
groupbystr.get_dummies

df.groupby('Animal').Food.apply('|'.join).str.get_dummies().astype(bool)

        cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

选项4
pandas.factorizenumpy.bincount

f1, u1 = pd.factorize(df.Animal.values)
f2, u2 = pd.factorize(df.Food.values)

n = u1.size
m = u2.size

b = np.bincount(f1 * m + f2, minlength=n * m).reshape(n, m)

pd.DataFrame(b.astype(bool), u1, u2)

        grass  carrots  cabbage    hay
rabbit   True     True     True  False
dog     False     True    False  False
horse    True    False    False   True

选项5
很无聊......所以提出了更多

f, u = pd.factorize(df.Animal.values)
n = u.size

a = [[] for _ in range(n)]
[a[i].append(food) for i, food in zip(f, df.Food)];
pd.DataFrame(a, u).rename(columns=lambda x: x+1).add_prefix('Food').fillna('')

          Food1    Food2    Food3
rabbit    grass  carrots  cabbage
dog     carrots                  
horse     grass      hay         

答案 1 :(得分:4)

您可以使用:

df = pd.pivot(index=df['Animal'], 
              columns=df.groupby('Animal').cumcount().add(1).astype(str), 
              values=df['Food'])
       .add_prefix('Food').fillna('')
print (df)
          Food1    Food2    Food3
Animal                           
dog     carrots                  
horse     grass      hay         
rabbit    grass  carrots  cabbage

或者:

df = pd.crosstab(df['Animal'], df['Food']).astype(bool)
print (df)
Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

答案 2 :(得分:4)

df_out = df.set_index(['Animal','Food']).assign(Value=True).unstack().fillna(False)
df_out.columns = df_out.columns.droplevel()

输出:

Food    cabbage  carrots  grass    hay
Animal                                
dog       False     True  False  False
horse     False    False   True   True
rabbit     True     True   True  False

答案 3 :(得分:3)

您可以使用交叉表

pd.crosstab(df.Animal, df.Food).astype(bool)


Food    cabbage carrots grass   hay
Animal              
dog     False   True    False   False
horse   False   False   True    True
rabbit  True    True    True    False