基于数据框1的熊猫分组数据框2

时间:2020-06-15 19:11:42

标签: python pandas dataframe

data = {"Team": ["Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Yankees", 
                 "Yankees", "Yankees", "Yankees", "Yankees", "Yankees"],
        "Pos": ["Pitcher", "Pitcher", "Pitcher", "Not Pitcher", "Not Pitcher", "Not Pitcher", 
                "Pitcher", "Pitcher", "Pitcher", "Not Pitcher", "Not Pitcher", "Not Pitcher"],
        "Age": [24, 28, 40, 22, 29, 33, 31, 26, 21, 36, 25, 31]}
df1 = pd.DataFrame(data)

现在,我使用以下代码按2列进行分组:

grouped_multiple = df1.groupby(['Team', 'Pos']).agg({'Age': ['mean', 'min', 'max']})
grouped_multiple.columns = ['age_mean', 'age_min', 'age_max']
grouped_multiple = grouped_multiple.reset_index()

现在,我创建第二个数据框,该数据框也具有3个列,长度相同,但仅数字作为值。 想象一下,数据框1的每个单元都与数据框2的相同位置单元链接。 当我对数据帧1进行分组->时,我想获取数据帧2的相应值

所以df1 groupyby列1

["Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Yankees", 
 "Yankees", "Yankees", "Yankees", "Yankees", "Yankees"]

结果

["Red Sox", "Yankees"]

让df2第1列看起来像

[1,2,4,3,2,3,4,5,3,5,6,7]

所以我想将df2的值-第1列->放在一个列表中,其中每个“ Red Sox”和“ Yankees”都采用了df1的相应索引。

喜欢

[[1,2,4,3,2,3][4,5,3,5,6,7]]

2 个答案:

答案 0 :(得分:0)

对于您要执行的操作,我还是不太清楚,但是如果您将两个数据帧连接起来,那么:

newdf = pd.concat([df1, df2], axis=1)

然后您可以进行groupby并在最后三列中做有需要的事情。

答案 1 :(得分:0)

不确定grouped_multiple在哪里出现问题,如果df1和df2具有相同的长度,我认为您可以这样做

df2 = pd.DataFrame({'col1':[1,2,4,3,2,3,4,5,3,5,6,7]})
s = df2['col1'].groupby(df1['Team']).agg(list)

你会得到

print (s)
Team
Red Sox    [1, 2, 4, 3, 2, 3]
Yankees    [4, 5, 3, 5, 6, 7]
Name: col1, dtype: object

,或者如果您想要列表列表,则

l = s.tolist()
print (l)
[[1, 2, 4, 3, 2, 3], [4, 5, 3, 5, 6, 7]]

如果要对df1中的两列进行分组,则可以

df2['col1'].groupby([df1['Team'], df1['Pos']]).agg(list)
Team     Pos        
Red Sox  Not Pitcher    [3, 2, 3]
         Pitcher        [1, 2, 4]
Yankees  Not Pitcher    [5, 6, 7]
         Pitcher        [4, 5, 3]
相关问题