下午好!请帮助我实现任务。 我有2个数据框。 一个是事实表:
df1:
Mall_1 (25 Ps) Trouses black 99
Mall_1 (25 Ps) Jumper Blue 48
Mall_2 (66 Ld) Trouses black None
Mall_2 (66 Ld) Skirt white 34
Mall_2 66 Ld Skirt black 34
Tokyo 77 Jacket white 45
Mall_3 (77 Tk) Jumper red 7
Mall_3 (77 Tk) Trouses Blue 87
London 66 Skirt green 10
Mall_1 25 Ps Jumper Blue 48
Sydney 78 Jumper red 7
Mall_4 59 Mn Jumper white 4
Milan 59 Skirt green 8
, df2 - the second dataframe is something like dictionary
df2
25 Ps Paris 25
66 Ld London 66
77 Tk Tokyo 77
78 Sn Sydney 78
23 NY New York 23
我需要执行以下操作: 如果第1列中的值为df1! =第2列df2的值,那么我需要找出第1列df2的哪个值包含在第1列df1中。然后用df2第二列的相应值完全替换df1第一列的单元格值。 如果第一列df1中的值=第二列df2中的值,请跳过。
我想要获得的输出:
df1:
Paris 25 Trouses black 99
Paris 25 Jumper Blue 48
London 66 Trouses black None
London 66 Skirt white 34
London 66 Skirt black 34
Tokyo 77 Jacket white 45
Tokyo 77 Jumper red 7
Tokyo 77 Trouses Blue 87
London 66 Skirt green 10
Paris 25 Jumper Blue 48
Sydney 78 Jumper red 7
NaN Jumper white 4
NaN Skirt green 8
我使用pandas 1.0.3,python 3.8
对于任何提示或建议,我将不胜感激。
答案 0 :(得分:0)
它可能已经解决了,但我会回答。 在处理流程中,我将数据分为几列,并将其与主DF组合为ID,然后根据输出进行调整。
import pandas as pd
import numpy as np
import io
data = '''
from id item color values
0 Mall_1 "(25 Ps)" Trouses black 99
1 Mall_1 "(25 Ps)" Jumper Blue 48
2 Mall_2 "(66 Ld)" Trouses black None
3 Mall_2 "(66 Ld)" Skirt white 34
4 Mall_2 "66 Ld" Skirt black 34
5 Tokyo 77 Jacket white 45
6 Mall_3 "(77 Tk)" Jumper red 7
7 Mall_3 "(77 Tk)" Trouses Blue 87
8 London 66 Skirt green 10
9 Mall_1 "25 Ps" Jumper Blue 48
10 Sydney 78 Jumper red 7
11 Mall_4 "59 Mn" Jumper white 4
12 Milan 59 Skirt green 8
'''
data2 = '''
ID short full ids
25 Ps Paris 25
66 Ld London 66
77 Tk Tokyo 77
78 Sn Sydney 78
23 NY "New York" 23
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
df2 = pd.read_csv(io.StringIO(data2), sep='\s+')
# 'ID' to string
df2['ID'] = df2['ID'].astype(str)
df = pd.concat([df, df['id'].str.split(' ', expand=True)], axis=1)
df.rename(columns={0:'ID',1:'short'}, inplace=True)
# '(' is replace ''
df['ID'] = df['ID'].str.replace('(', '')
df['ID'] = df['ID'].astype(str)
# df, df2 combine
new_df = pd.merge(df, df2, on='ID', how='left')
new_df['ids'] = new_df['ids'].fillna(0).astype(int)
# full and ids combine with ' '
new_df['full_id'] = new_df['full'].str.cat(new_df['ids'].astype(str), sep=' ')
# output
new_df = new_df[['full_id','item','color','values']]
new_df
full_id item color values
0 Paris 25 Trouses black 99
1 Paris 25 Jumper Blue 48
2 London 66 Trouses black None
3 London 66 Skirt white 34
4 London 66 Skirt black 34
5 Tokyo 77 Jacket white 45
6 Tokyo 77 Jumper red 7
7 Tokyo 77 Trouses Blue 87
8 London 66 Skirt green 10
9 Paris 25 Jumper Blue 48
10 Sydney 78 Jumper red 7
11 NaN Jumper white 4
12 NaN Skirt green 8