Question

我正在尝试根据条件表达式覆盖DataFrame中的新列：

if df['service'] == 'PE1' or 'PE2':
将df ['service']中的现有值更改为等于原始df ['service'] + df ['load port']。
# if ['load port] == 'ABC' then new value == PE1ABC
其他：将原始值保留在df ['service'] # in other words != 'PE1' or 'PE2'.

我正在尝试使用.merge（）从另一个DataFrame进行“ VLOOKUP”。但是，“ PE1”和“ PE2”服务需要加载端口。所有其他服务都有1：1分配。

Answer 1

您可以根据条件定义函数，而不是使用apply函数来更改列。

示例数据框：

import pandas as pd

df = pd.DataFrame({'service':['PE1','PE2','bla','ble','PE2'],\
                   'load port':['ABC','TEST','BLA','BLA','BLE']})

输出：

  load port service
0       ABC     PE1
1      TEST     PE2
2       BLA     bla
3       BLA     ble
4       BLE     PE2

更改功能：

def changeService(row):
    if row['service'] == 'PE1' or row['service'] == 'PE2':
        return row['service'] + row['load port']
    return row['service']

应用更改功能以覆盖您的列：

df['service'] = df.apply(changeService, axis = 1)

输出：

  load port  service
0       ABC   PE1ABC
1      TEST  PE2TEST
2       BLA      bla
3       BLA      ble
4       BLE   PE2BLE

注意：建议您的更改函数始终使用return，否则某些行将填充NaN值。

Answer 2

您可以使用custom-express-server example执行以下任务：

import numpy as np
df['service'] = np.where((df['service'] =='PE1')|(df['service'] =='PE2'), #conditions
                          df['service']+df['load port'], #result if conditions are met
                          df['service']) # result if not

使用 @Lorran Sutter 中的apply的方法很好，但是如果您的数据帧很大，则此方法会更快。

如何通过条件表达式覆盖序列值？

2 个答案: