我有一个像这样的pandas DataFrame:
dict = {'plan_id':["4H", "40", "HA", "H5", '5B'],
'planproduct': ["4H - MMP", "40 - STAR", "9H - STAR+PLUS", "HA - MMP", 'C4 - STAR+PLUS'],
'juliandat':['114', '157', '149', '142', '150']}
df = pd.DataFrame(dict, index = [1, 2, 3, 4, 5])
说我有一些列表,例如:
starplus_id = ['47', '9H', 'H5', '5B', 'C4']
mmp_pp = ['4H - MMP', 'HA - MMP', '9K - MMP']
mmp_id = ['4H','HA','9K']
starplus_pp = ['47 - STAR+PLUS', '9H - STAR+PLUS', 'H5 - STAR+PLUS', '5B - STAR+PLUS', 'C4 - STAR+PLUS']
我要过滤掉的行,如果plan_id
值是'starplus_id'值之一,则planproduct
字段不能是 mmp_id 值,反之亦然。
如果planproduct
是“ starplus_pp ”之一,则plan_id
不能是“ mmp_id ”值之一反之亦然。另外,如果plan_id
不同于“ starplus_id ”,也可以。 (我在代码括号中包括了列名,在斜体中包括了list_names)。
我不知道该怎么做。我尝试使用in
运算符,例如:
df = final[((df['plan_id'] in starplus_id) & (df['planproduct'] not in mmp_pp)) &
((df['plan_id'] in mmp_id) & (df['planproduct'] not in starplus_pp)) &
((df['planproduct'] in starplus_pp) & (df['plan_id'] not in mmp_id)) &
((df['planproduct'] in mmp_pp) & (df['plan_id'] not in starplus_id)) |
(df['plan_id'] not in starplus_pp)
]
但是我得到
ValueError:系列的真值不明确。使用a.empty,a.bool(),a.item(),a.any()或a.all()。
这是我尝试在熊猫中执行的更复杂的布尔索引,不确定如何执行。结果应该看起来像
plan_id planproduct juliandate
1 4H 4H - MMP 114
2 40 40 - STAR 157
5 5B C4 - STAR+PLUS 150
答案 0 :(得分:1)
看看我的尝试。我修改了starplus_pp
以摆脱whitespace,+,-
,因为str.contains
方法在捕获字符时存在问题。这就需要创建临时列,而这些列在最后iloc
访问器中就没有了。
#临时列
df['planproducts']=df['planproduct'].str.replace('[-+\s]','')#Concats values to match list and escape space,+-
df['planproductsz']=df['planproduct'].str.split('-').str[0]#Extracts the first phrase in planproduct
starplus_id = ['47', '9H', 'H5', '5B', 'C4']
mmp_pp = ['4H - MMP', 'HA - MMP', '9K - MMP']
mmp_id = ['4H','HA','9K']
starplus_pp = ['47STARPLUS', '9HSTARPLUS', 'H5STARPLUS', '5BSTARPLUS', 'C4STARPLUS']#Modified list
sid='|'.join(starplus_id)
mp='|'.join(mmp_pp)
sp='|'.join(starplus_pp)
mid='|'.join(mmp_id)
df2=df[~((df.plan_id.str.contains(sid))&(df.planproductsz.str.contains(mid)))]
#df2[~((df2.planproducts.str.contains(sp)&df2.plan_id.str.contains(mid)))]
df2[~((df2.planproducts.str.contains(sp)&df2.plan_id.str.contains(mid)))].iloc[:,:3:]
plan_id planproduct juliandat
1 4H 4H - MMP 114
2 40 40 - STAR 157
5 5B C4 - STAR+PLUS 150