Question

嗨，您有一个包含10000+行的数据框，看起来像这样-

df = pd.DataFrame([['110', 'Demand', 2344, 30953], 
                   ['111', 'Supply', 3535, 321312], 
                   ['112', 'Supply', 35345, 2324], 
                   ['113', 'Demand', 24345, 4542], 
                   ['114', 'Supply', 342, 435623]], 
                  columns=['Material', 'Title', '201950', '201951'])
df

Material    Title   201950  201951
110         Demand  2344    30953
111         Supply  3535    321312
112         Supply  35345   2324
113         Demand  24345   4542
114         Supply  342     435623

我还有另一个小数据框，大约有4-5行，像这样-

extra = pd.DataFrame([['111', 'Supply', 10],
                     ['112', 'Supply', 20],
                     ['114', 'Supply', 30],
                     ['115', 'Supply', 40]],
                    columns=['Material', 'Title', '201950'])
extra
Material    Title   201950
111         Supply    10
112         Supply    20
114         Supply    30
115         Supply    40

我想使用201950和df匹配的extra中的值替换Material中Title列Material Title 201950 201951 110 Demand 2344 30953 111 Supply 10 321312 112 Supply 20 2324 113 Demand 24345 4542 114 Supply 30 435623中的值，以使结果数据帧看起来像像这样-

updated = df.merge(extra, how='left',
                       on=['Material', 'Title'],
                       suffixes=('', '_new'))
new = '201950_new'
updated['201950'] = np.where(pd.notnull(updated[new]), updated[new], updated['201950'])
updated.drop(new, axis=1, inplace=True)

我确实尝试过合并

df

这给了我所需的输出。但我正在寻找一种更有效的解决方案。由于extra很大，而.data()只有4行。

Answer 1

使用DataFrame.update，但首先在两个MultiIndex中分别由Material和Title列创建DataFrame：

df = df.set_index(['Material','Title'])
extra = extra.set_index(['Material','Title'])

df.update(extra)
df = df.astype(int).reset_index()
print (df)
  Material   Title  201950  201951
0      110  Demand    2344   30953
1      111  Supply      10  321312
2      112  Supply      20    2324
3      113  Demand   24345    4542
4      114  Supply      30  435623

Answer 2

您可以尝试以下方法：

extra.set_index(['Material','Title']).combine_first(df.set_index(['Material','Title'])).dropna().reset_index().astype(object)

输出：

  Material   Title 201950  201951
0      110  Demand   2344   30953
1      111  Supply     10  321312
2      112  Supply     20    2324
3      113  Demand  24345    4542
4      114  Supply     30  435623

从另一个数据框中替换列的值

2 个答案: