使用熊猫更改数据框

时间:2020-05-27 09:32:54

标签: python pandas dataframe

下午好!请帮助我实现任务。 我有2个数据框。 一个是事实表:

df1:
Mall_1 (25 Ps)      Trouses     black     99
Mall_1 (25 Ps)      Jumper      Blue      48
Mall_2 (66 Ld)      Trouses     black     None   
Mall_2 (66 Ld)      Skirt       white     34 
Mall_2 66 Ld        Skirt       black     34    
Tokyo 77            Jacket      white     45
Mall_3 (77 Tk)      Jumper      red       7
Mall_3 (77 Tk)      Trouses     Blue      87
London 66           Skirt       green     10
Mall_1 25 Ps        Jumper      Blue      48
Sydney 78           Jumper      red       7
Mall_4 59 Mn        Jumper      white     4
Milan 59            Skirt       green     8

, df2 - the second dataframe is something like dictionary
df2
25 Ps   Paris 25
66 Ld   London 66
77 Tk   Tokyo 77
78 Sn   Sydney 78
23 NY   New York 23

我需要执行以下操作: 如果第1列中的值为df1! =第2列df2的值,那么我需要找出第1列df2的哪个值包含在第1列df1中。然后用df2第二列的相应值完全替换df1第一列的单元格值。 如果第一列df1中的值=第二列df2中的值,请跳过。

我想要获得的输出:

df1:
Paris 25       Trouses     black     99
Paris 25       Jumper      Blue      48
London 66      Trouses     black     None   
London 66      Skirt       white     34 
London 66      Skirt       black     34    
Tokyo 77       Jacket      white     45
Tokyo 77       Jumper      red       7
Tokyo 77       Trouses     Blue      87
London 66      Skirt       green     10
Paris 25       Jumper      Blue      48
Sydney 78      Jumper      red       7
NaN            Jumper      white     4
NaN            Skirt       green     8

我使用pandas 1.0.3,python 3.8

对于任何提示或建议,我将不胜感激。

1 个答案:

答案 0 :(得分:0)

它可能已经解决了,但我会回答。 在处理流程中,我将数据分为几列,并将其与主DF组合为ID,然后根据输出进行调整。

import pandas as pd
import numpy as np
import io

data = '''
 from id item color values
0 Mall_1 "(25 Ps)" Trouses black 99
1 Mall_1 "(25 Ps)" Jumper Blue 48
2 Mall_2 "(66 Ld)" Trouses black None
3 Mall_2 "(66 Ld)" Skirt white 34
4 Mall_2 "66 Ld" Skirt black 34    
5 Tokyo 77 Jacket white 45
6 Mall_3 "(77 Tk)" Jumper red 7
7 Mall_3 "(77 Tk)" Trouses Blue 87
8 London 66 Skirt green 10
9 Mall_1 "25 Ps" Jumper Blue 48
10 Sydney 78 Jumper red 7
11 Mall_4 "59 Mn" Jumper white 4
12 Milan 59 Skirt green 8
'''

data2 = '''
ID short full ids
25 Ps Paris 25
66 Ld London 66
77 Tk Tokyo 77
78 Sn Sydney 78
23 NY "New York" 23
'''

df = pd.read_csv(io.StringIO(data), sep='\s+')
df2 = pd.read_csv(io.StringIO(data2), sep='\s+')
# 'ID' to string
df2['ID'] = df2['ID'].astype(str)
df = pd.concat([df, df['id'].str.split(' ', expand=True)], axis=1)
df.rename(columns={0:'ID',1:'short'}, inplace=True)

# '(' is replace ''
df['ID'] = df['ID'].str.replace('(', '')
df['ID'] = df['ID'].astype(str)

# df, df2 combine
new_df = pd.merge(df, df2, on='ID', how='left') 

new_df['ids'] = new_df['ids'].fillna(0).astype(int)
# full and ids combine with ' '
new_df['full_id'] = new_df['full'].str.cat(new_df['ids'].astype(str), sep=' ')

# output
new_df = new_df[['full_id','item','color','values']]

new_df
    full_id     item    color   values
0   Paris 25    Trouses black   99
1   Paris 25    Jumper  Blue    48
2   London 66   Trouses black   None
3   London 66   Skirt   white   34
4   London 66   Skirt   black   34
5   Tokyo 77    Jacket  white   45
6   Tokyo 77    Jumper  red     7
7   Tokyo 77    Trouses Blue    87
8   London 66   Skirt   green   10
9   Paris 25    Jumper  Blue    48
10  Sydney 78   Jumper  red     7
11  NaN        Jumper   white   4
12  NaN         Skirt   green   8