data = {"Team": ["Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Yankees",
"Yankees", "Yankees", "Yankees", "Yankees", "Yankees"],
"Pos": ["Pitcher", "Pitcher", "Pitcher", "Not Pitcher", "Not Pitcher", "Not Pitcher",
"Pitcher", "Pitcher", "Pitcher", "Not Pitcher", "Not Pitcher", "Not Pitcher"],
"Age": [24, 28, 40, 22, 29, 33, 31, 26, 21, 36, 25, 31]}
df1 = pd.DataFrame(data)
现在,我使用以下代码按2列进行分组:
grouped_multiple = df1.groupby(['Team', 'Pos']).agg({'Age': ['mean', 'min', 'max']})
grouped_multiple.columns = ['age_mean', 'age_min', 'age_max']
grouped_multiple = grouped_multiple.reset_index()
现在,我创建第二个数据框,该数据框也具有3个列,长度相同,但仅数字作为值。 想象一下,数据框1的每个单元都与数据框2的相同位置单元链接。 当我对数据帧1进行分组->时,我想获取数据帧2的相应值
所以df1 groupyby列1
["Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Red Sox", "Yankees",
"Yankees", "Yankees", "Yankees", "Yankees", "Yankees"]
结果
["Red Sox", "Yankees"]
让df2第1列看起来像
[1,2,4,3,2,3,4,5,3,5,6,7]
所以我想将df2的值-第1列->放在一个列表中,其中每个“ Red Sox”和“ Yankees”都采用了df1的相应索引。
喜欢
[[1,2,4,3,2,3][4,5,3,5,6,7]]
答案 0 :(得分:0)
对于您要执行的操作,我还是不太清楚,但是如果您将两个数据帧连接起来,那么:
newdf = pd.concat([df1, df2], axis=1)
然后您可以进行groupby
并在最后三列中做有需要的事情。
答案 1 :(得分:0)
不确定grouped_multiple
在哪里出现问题,如果df1和df2具有相同的长度,我认为您可以这样做
df2 = pd.DataFrame({'col1':[1,2,4,3,2,3,4,5,3,5,6,7]})
s = df2['col1'].groupby(df1['Team']).agg(list)
你会得到
print (s)
Team
Red Sox [1, 2, 4, 3, 2, 3]
Yankees [4, 5, 3, 5, 6, 7]
Name: col1, dtype: object
,或者如果您想要列表列表,则
l = s.tolist()
print (l)
[[1, 2, 4, 3, 2, 3], [4, 5, 3, 5, 6, 7]]
如果要对df1中的两列进行分组,则可以
df2['col1'].groupby([df1['Team'], df1['Pos']]).agg(list)
Team Pos
Red Sox Not Pitcher [3, 2, 3]
Pitcher [1, 2, 4]
Yankees Not Pitcher [5, 6, 7]
Pitcher [4, 5, 3]