我有一个包含多对多关系的两列表。
e.g。
Animal Food
rabbit grass
rabbit carrots
rabbit cabbage
dog carrots
horse grass
horse hay
我想要这样的事情:
Animal Food1 Food2 Food3
rabbit grass carrots cabbage
dog carrots
horse grass hay
或类似的东西:
Animal Grass Carrots Cabbage Hay
rabbit True True True False
dog False True False False
horse True False False True
我尝试过数据透视表,融合和堆叠,但仍然无法弄清楚如何做到这一点。任何帮助将不胜感激。谢谢!
答案 0 :(得分:4)
选项1
groupby
与size
df.groupby(['Animal', 'Food']).size().unstack(fill_value=0).astype(bool)
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
s = df.groupby('Animal').Food.apply(list)
pd.DataFrame(s.values.tolist(), s.index).add_prefix('Food').fillna('')
Food0 Food1 Food2
Animal
dog carrots
horse grass hay
rabbit grass carrots cabbage
选项2
groupby
与value_counts
df.groupby('Animal').Food.value_counts().unstack(fill_value=0).astype(bool)
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
选项3
groupby
和str.get_dummies
df.groupby('Animal').Food.apply('|'.join).str.get_dummies().astype(bool)
cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
选项4
pandas.factorize
与numpy.bincount
f1, u1 = pd.factorize(df.Animal.values)
f2, u2 = pd.factorize(df.Food.values)
n = u1.size
m = u2.size
b = np.bincount(f1 * m + f2, minlength=n * m).reshape(n, m)
pd.DataFrame(b.astype(bool), u1, u2)
grass carrots cabbage hay
rabbit True True True False
dog False True False False
horse True False False True
选项5
很无聊......所以提出了更多
f, u = pd.factorize(df.Animal.values)
n = u.size
a = [[] for _ in range(n)]
[a[i].append(food) for i, food in zip(f, df.Food)];
pd.DataFrame(a, u).rename(columns=lambda x: x+1).add_prefix('Food').fillna('')
Food1 Food2 Food3
rabbit grass carrots cabbage
dog carrots
horse grass hay
答案 1 :(得分:4)
您可以使用:
df = pd.pivot(index=df['Animal'],
columns=df.groupby('Animal').cumcount().add(1).astype(str),
values=df['Food'])
.add_prefix('Food').fillna('')
print (df)
Food1 Food2 Food3
Animal
dog carrots
horse grass hay
rabbit grass carrots cabbage
或者:
df = pd.crosstab(df['Animal'], df['Food']).astype(bool)
print (df)
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
答案 2 :(得分:4)
df_out = df.set_index(['Animal','Food']).assign(Value=True).unstack().fillna(False)
df_out.columns = df_out.columns.droplevel()
输出:
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
答案 3 :(得分:3)
您可以使用交叉表
pd.crosstab(df.Animal, df.Food).astype(bool)
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False