我想根据df2中每个水果的行数创建一个新列。
Expected Output of df1
No | Fruit_Name | 2018 | 2019 | 2020
1 | Apple | 2 | 1 | 0
2 | Banana | 0 | 0 | 1
3 | Cherries | 0 | 0 | 1
df1 df2
No | Fruit_Name | year | farmer | fruit_farmed
1 | Apple | 2018 | John | Apple
2 | Banana | 2019 | Timo | Apple
3 | Cherries | 2020 | Eva | Cherries
2020 | Frey | Banana
2018 | Ali | Apple
无效的代码:
i=0
for i in range(3):
df1['2018'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
df1['2019'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
df1['2020'] = len(df2.loc[df2['fruit_farmed'] == df1['Fruit_Name'][i]])
i=i+1
Output:
No Fruit_Name 2018 2019 2020
0 1 Apple 1 1 1
1 2 Banana 1 1 1
2 3 Cherries 1 1 1
答案 0 :(得分:2)
您可以先尝试crosstab
,然后再尝试join
s = pd.crosstab(df2.fruit_farmed, df2.year)
s = s.reindex(df1.Fruit_Name)
s.index=df1.index
df1 = df1.join(s)
答案 1 :(得分:0)
另一种方法是对fruit_farmed进行分组,然后年份,然后取消堆叠。
import pandas as pd
df2 = pd.DataFrame([[2018,'John','Apple'],[2019,'Timo','Apple'],
[2020,'Eva','Cherries'],[2020,'Frey','Banna'],
[2018,'Ali','Apple']],
columns=['year','farmer','fruit_farmed'])
df1 = df2.groupby(['fruit_farmed','year']).count().unstack('year').reset_index().fillna(0)
#rename the columns
df1.columns = ['fruit_farmed','2018','2019','2020']
print(df1)
fruit_farmed 2018 2019 2020
0 Apple 2.0 1.0 0.0
1 Banna 0.0 0.0 1.0
2 Cherries 0.0 0.0 1.0