这个问题是由我之前提出的问题-Pandas groupby make two columns lists separately引起的。这次,我想创建一个新列,其中每个值都是一个列表,其中包含来自其他两列的压缩值的元组。例如:
# Original DataFrame
fruit sport weather
0 apple [baseball, basketball] [sunny, windy]
1 banana [swimming, hockey] [cloudy, windy]
2 orange [football] [sunny]
# Desired DataFrame
fruit sport weather pairs
0 apple [baseball, basketball] [sunny, windy] [(baseball, sunny), (basketball, windy)]
1 banana [swimming, hockey] [cloudy, windy] [(swimming, cloudy), (hocky, windy)]
2 orange [football] [sunny] [(football, sunny)]
我已经尝试了以下代码,但是它给了我其他东西:
df['pairs'] = list(zip(df['sport'], df['weather']))
# Output DataFrame
fruit sport weather pairs
0 apple [baseball, basketball] [sunny, windy] ([baseball, sunny], [basketball, windy])
1 banana [swimming, hockey] [cloudy, windy] ([swimming, cloudy], [hocky, windy])
2 orange [football] [sunny] ([football], [sunny])
如您所见,它与我想做的“相反”。我应该怎么做呢?预先感谢。
答案 0 :(得分:2)
我认为您缺少另一个list(zip())
:
df['pairs'] = list(list(zip(a,b)) for a,b in zip(df['sport'], df['weather']))
输出:
fruit sport weather pairs
0 apple ['baseball', 'basketball'] ['sunny', 'windy'] [('baseball', 'sunny'), ('basketball', 'windy')]
1 banana ['swimming', 'hockey'] ['cloudy', 'windy'] [('swimming', 'cloudy'), ('hockey', 'windy')]
2 orange ['football'] ['sunny'] [('football', 'sunny')]
答案 1 :(得分:1)
在axis=1
和zip
上使用DataFrame.apply
:
df['pairs'] = df.apply(lambda x: list(zip(x['sport'], x['weather'])), axis=1)
fruit sport weather pairs
0 apple [baseball, basketball] [sunny, windy] [(baseball, sunny), (basketball, windy)]
1 banana [swimming, hockey] [cloudy, windy] [(swimming, cloudy), (hockey, windy)]
2 orange [football] [sunny] [(football, sunny)]
答案 2 :(得分:1)
您可以利用地图具有嵌入式zip 的事实,然后执行以下操作:
df['pairs'] = [list(x) for x in map(zip, df['sport'], df['weather'])]
print(df)
输出
fruit ... pairs
0 apple ... [(baseball, sunny), (basketball, windy)]
1 banana ... [(swimming, cloudy), (hockey, windy)]
2 orange ... [(football, sunny)]
[3 rows x 4 columns]
或者您可以使用itertuples:
df['pairs'] = [list(zip(*x)) for x in df[['sport', 'weather']].itertuples(index=False)]