我有两个数据帧df1和df2:
df1 :
Name A_list
abcd (apple,orange,banana)
bcde (orange,mango)
cdef (apple,pineapple)
df2 :
City B_list
C1 (apple,mango,banana)
C2 (mango)
C3 (pineapple,banana)
我想创建一个新的数据帧df3
Name A_list City
abcd (apple,orange,banana) (C1,C3)
bcde (orange,mango) (C1,C2)
cdef (apple,pineapple) (C1,C3)
即通过Df1中的A_list并确定每个水果来自哪个城市。 我不知道如何使用列表A_list和B_list
合并df1和df2答案 0 :(得分:2)
df1 = pd.DataFrame([
['abcd', ('apple', 'orange', 'banana')],
['bcde', ('orange', 'mango')],
['cdef', ('apple', 'pineapple')]
], columns=['Name', 'A_list'])
df2 = pd.DataFrame([
['C1', ('apple', 'mango', 'banana')],
['C2', ('mango')],
['C3', ('pineapple', 'banana')]
], columns=['City', 'B_list'])
按摩数据
s2 = df2.set_index('City').squeeze() \
.apply(pd.Series) \
.stack().reset_index(1, drop=True)
s2
City
C1 apple
C1 mango
C1 banana
C2 mango
C3 pineapple
C3 banana
dtype: object
s1 = df1.set_index('Name').squeeze() \
.apply(pd.Series) \
.stack().reset_index(1, drop=True)
s1
Name
abcd apple
abcd orange
abcd banana
bcde orange
bcde mango
cdef apple
cdef pineapple
dtype: object
df3 = pd.merge(*[s.rename('fruit').reset_index() for s in [s1, s2]])
df3
def tuplify(series):
return tuple(set(series))
df3.groupby('Name') \
.apply(lambda df: df.drop('Name', axis=1).apply(tuplify)) \
.rename(columns=dict(fruit='A_list')).reset_index()
请注意'orange'
缺失,因为它没有'City'
表示。如果您想要相同的A_list
df3 = pd.merge(*[s.rename('fruit').reset_index() for s in [s1, s2]])
df3 = df3.groupby('Name') \
.apply(lambda df: df.drop('Name', axis=1).apply(tuplify)) \
.rename(columns=dict(fruit='A_list'))
df3['A_list'] = df1.set_index('Name')['A_list']
df3.reset_index()