我有一个df(形状(5928,22)),我正在尝试创建一个新列并根据多个条件添加值。
条件将是:
if CH == 20 then value = 268,34
if CH == 24 then value = 322,02
if CH == 30 then value = 492,65
if CH == 40 then value = 536,69
and
if CH == 20 & ID in (5105561300, 5105561301, 5105561302, 5105561304) then value = 417,43
if CH == 24 & ID in (5105561300, 5105561301, 5105561302, 5105561304) then value = 500,91
if CH == 30 & ID in (5105561300, 5105561301, 5105561302, 5105561304) then value = 626,34
if CH == 40 & ID in (5105561300, 5105561301, 5105561302, 5105561304) then value = 834,85
当我尝试添加新列并根据第一个条件块附加值时,它会很好地工作。
new_value = []
for row in df['CH']:
if row == 20:
new_value.append(268.34)
elif row == 24:
new_value.append(322.02)
elif row == 30:
new_value.append(402.65)
elif row == 40:
new_value.append(536.69)
else:
new_value.append(0)
df['new_value'] = new_value
当我尝试添加其他条件时,它将无法正常工作。代码类似于:
new_value = []
for row in df['CH']:
if row == 20 and df['ID'] not in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(268.34)
elif row == 20 and df['ID'] in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(417.43)
elif row == 24 and df['ID'] not in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(268.34)
elif row == 24 and df['ID'] in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(500.91)
elif row == 30 and df['ID'] not in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(268.34)
elif row == 30 and df['ID'] in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(626.34)
elif row == 40 and df['ID'] not in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(268.34)
elif row == 40 and df['ID'] in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(834.85)
else:
new_value.append(0)
df['new_value'] = new_value
当我尝试上面的代码时,我收到以下错误消息:
ValueError:系列的真值不明确。使用a.empty,a.bool(),a.item(),a.any()或a.all()。
我不知道该怎么走。在SQL中,我将使用两个简单的WHERE语句,但无法在Python中运行它。
答案 0 :(得分:1)
map
,一个isin
和一个np.where
mtrue = {20: 268.34, 24: 322.02, 30: 492.65, 40: 536.69}
mfalse = {20: 417.43, 24: 500.91, 30: 626.34, 40: 834.85}
ids = {5105561300, 5105561301, 5105561302, 5105561304}
df['new_value'] = np.where(df['ID'].isin(ids), df['CH'].map(mtrue), df['CH'].map(mfalse))
map
和一个zip
mtrue = {20: 268.34, 24: 322.02, 30: 492.65, 40: 536.69}
mfalse = {20: 417.43, 24: 500.91, 30: 626.34, 40: 834.85}
ids = {5105561300, 5105561301, 5105561302, 5105561304}
m = {
(b, k): v for b, d in zip([True, False], [mtrue, mfalse])
for k, v in d.items()
}
df['new_value'] = [*map(m.get, zip(df['ID'].isin(ids), df['CH']))]
以防万一您可以[*map...]
df['new_value'] = [m[t] for t in zip(df['ID'].isin(ids), df['CH']))]
答案 1 :(得分:1)
您的代码问题出在df['ID']
中,请更改行循环方式,以解决以下错误消息:
for row, id in zip(df['CH'], df['ID']):
if row == 20 and id not in (5105561300, 5105561301, 5105561302, 5105561304):
new_value.append(268.34)
elif row == 20 and id in (5105561300, 5105561301, 5105561302, 5105561304):
...
由于数据集不是很大,因此可以使用列表推导来处理此任务:
# a set of ids to check existence
wlist = { 5105561300, 5105561301, 5105561302, 5105561304 }
# the value of each key is a list with the first element using the value
# when id not in wlist and the 2nd element the value when id is in wlist
mapping = {
20: [268.34, 417.43]
, 24: [322.02, 500.91]
, 30: [492.65, 626.34]
, 40: [536.69, 834.85]
}
# new_value will depend on if CH is in mapping and id in wlist
df['new_value'] = [ mapping[ch][int(id in wlist)] if ch in mapping else 0 for ch, id in zip(df.CH, df.ID) ]
答案 2 :(得分:0)
您似乎可以将其合并很多,并避免冗余:
default = 268.34
for row in df['CH']:
id_check = df['ID'] in (5105561300, 5105561301, 5105561302, 5105561304)
if row == 20:
new_value = 417.43
elif row == 24:
new_value = 500.91
elif row == 30:
new_value = 626.34
elif row == 40
new_value = 834.85
else:
new_value = 0
df['new_value'] = default if not id_check else value
或者,您可以映射它:
def get_new_value(row):
d = { 20: 417.43,
24: 500.91,
30: 626.34,
40: 834.85 }
return d.get(row, 0)
default = 268.34
for row in df['CH']:
id_check = df['ID'] in (5105561300, 5105561301, 5105561302, 5105561304)
new_value = default if not id_check else get_new_value(row)
df['new_value'] = new_value