使用另一个具有相应替换的pandas df替换pandas列中的值

时间:2017-07-03 06:27:53

标签: python pandas

我有一个名为inventory的pandas df,它有一个包含Part Numbers(AlphaNumeric)的列。其中一些部件号已被取代,我还有另一个名为replace_with的df,其中包含两列'old part numbers''new part numbers'。 例如:

库存的值如下:

* 123AAA
* 123BBB
* 123CCC
......

和replace-with具有类似

的值
**oldPartnumbers**   .....        **newPartnumbers**  

* 123AAA        ............            123ABC
* 123CCC          ...........          123DEF

所以,我需要用新数字替换库存中的相应值。更换后的库存将如下所示:

* 123ABC
* 123BBB
* 123DEF

在python中有一种简单的方法吗?谢谢!

3 个答案:

答案 0 :(得分:2)

设置

考虑数据框inventoryreplace_with

inventory = pd.DataFrame(dict(Partnumbers=['123AAA', '123BBB', '123CCC']))

replace_with = pd.DataFrame(dict(
        oldPartnumbers=['123AAA', '123BBB', '123CCC'],
        newPartnumbers=['123ABC', '123DEF', '123GHI']
    ))

选项1
map

d = replace_with.set_index('oldPartnumbers').newPartnumbers
inventory['Partnumbers'] = inventory['Partnumbers'].map(d)

inventory

  Partnumbers
0      123ABC
1      123DEF
2      123GHI

选项2
replace

d = replace_with.set_index('oldPartnumbers').newPartnumbers
inventory['Partnumbers'].replace(d, inplace=True)

inventory

  Partnumbers
0      123ABC
1      123DEF
2      123GHI

答案 1 :(得分:1)

假设您有2个df如下:

import pandas as pd
df1 = pd.DataFrame([[1,3],[5,4],[6,7]], columns = ['PN','name'])
df2 = pd.DataFrame([[2,22],[3,33],[4,44],[5,55]], columns = ['oldname','newname'])

DF1:

    PN  oldname
0   1   3
1   5   4
2   6   7

DF2:

    oldname  newname
0   2        22
1   3        33
2   4        44
3   5        55

在他们之间运行左联接:

temp = df1.merge(df2,'left',left_on='name',right_on='oldname')

温度:

    PN      name     oldname    newname
0   1        3         3.0      33.0
1   5        4         4.0      44.0
2   6        7         NaN      NaN

然后计算新的name列并替换它:

df1['name'] = temp.apply(lambda row: row['newname'] if pd.notnull(row['newname']) else row['name'], axis=1)

DF1:

    PN  name
0   1   33.0
1   5   44.0
2   6   7.0

一个班次

df1['name'] = df1.merge(df2,'left',left_on='name',right_on='oldname').apply(lambda row: row['newname'] if pd.notnull(row['newname']) else row['name'], axis=1)

答案 2 :(得分:1)

这个解决方案相对较快 - 它使用pandas数据对齐和numpy" copyto"功能

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'partNumbers': ['123AAA', '123BBB', '123CCC', '123DDD']})
df2 = pd.DataFrame({'oldPartnumbers': ['123AAA', '123BBB', '123CCC'],
                    'newPartnumbers': ['123ABC', '123DEF', '123GHI']})

# assign index in each dataframe to original part number columns
# (faster than set_index method, but use set_index if original index must be preserved)
df1.index = df1.partNumbers
df2.index = df2.oldPartnumbers
# use pandas index data alignment
df1['updatedPartNumbers'] = df2.newPartnumbers
# use numpy to copy in old part num when a new part num is not found
np.copyto(df1.updatedPartNumbers.values,
          df1.partNumbers.values,
          where=pd.isnull(df1.updatedPartNumbers))
# reset index
df1.reset_index(drop=True, inplace=True)

DF1:

  partNumbers updatedPartNumbers
0      123AAA             123ABC
1      123BBB             123DEF
2      123CCC             123GHI
3      123DDD             123DDD