修改后的列表

Question

我有一个像这样的pandas DataFrame：

dict = {'plan_id':["4H", "40", "HA", "H5", '5B'], 
    'planproduct': ["4H - MMP", "40 - STAR", "9H - STAR+PLUS", "HA - MMP", 'C4 - STAR+PLUS'], 
    'juliandat':['114', '157', '149', '142', '150']}

df = pd.DataFrame(dict, index = [1, 2, 3, 4, 5])

说我有一些列表，例如：

starplus_id = ['47', '9H', 'H5', '5B', 'C4']
mmp_pp = ['4H - MMP', 'HA - MMP', '9K - MMP']
mmp_id = ['4H','HA','9K']
starplus_pp = ['47 - STAR+PLUS', '9H - STAR+PLUS', 'H5 - STAR+PLUS', '5B - STAR+PLUS', 'C4 - STAR+PLUS']

我要过滤掉的行，如果plan_id值是'starplus_id'值之一，则planproduct字段不能是 mmp_id 值，反之亦然。如果planproduct是“ starplus_pp ”之一，则plan_id不能是“ mmp_id ”值之一反之亦然。另外，如果plan_id不同于“ starplus_id ”，也可以。（我在代码括号中包括了列名，在斜体中包括了list_names）。

我不知道该怎么做。我尝试使用in运算符，例如：

df = final[((df['plan_id'] in starplus_id) & (df['planproduct'] not in mmp_pp)) & 
       ((df['plan_id'] in mmp_id) & (df['planproduct'] not in starplus_pp)) &
      ((df['planproduct'] in starplus_pp) & (df['plan_id'] not in mmp_id)) &
       ((df['planproduct'] in mmp_pp) & (df['plan_id'] not in starplus_id)) |
       (df['plan_id'] not in starplus_pp)
      ]

但是我得到

ValueError：系列的真值不明确。使用a.empty，a.bool（），a.item（），a.any（）或a.all（）。

这是我尝试在熊猫中执行的更复杂的布尔索引，不确定如何执行。结果应该看起来像

plan_id planproduct juliandate 1 4H 4H - MMP 114 2 40 40 - STAR 157 5 5B C4 - STAR+PLUS 150

Answer 1

看看我的尝试。我修改了starplus_pp以摆脱whitespace,+,-，因为str.contains方法在捕获字符时存在问题。这就需要创建临时列，而这些列在最后iloc访问器中就没有了。

＃临时列

df['planproducts']=df['planproduct'].str.replace('[-+\s]','')#Concats values to match list and escape space,+-
df['planproductsz']=df['planproduct'].str.split('-').str[0]#Extracts the first phrase in planproduct

修改后的列表

starplus_id = ['47', '9H', 'H5', '5B', 'C4']
mmp_pp = ['4H - MMP', 'HA - MMP', '9K - MMP']
mmp_id = ['4H','HA','9K']
starplus_pp = ['47STARPLUS', '9HSTARPLUS', 'H5STARPLUS', '5BSTARPLUS', 'C4STARPLUS']#Modified list

使用.join构造字符串

sid='|'.join(starplus_id)
mp='|'.join(mmp_pp)
sp='|'.join(starplus_pp)
mid='|'.join(mmp_id)

查询

df2=df[~((df.plan_id.str.contains(sid))&(df.planproductsz.str.contains(mid)))]
#df2[~((df2.planproducts.str.contains(sp)&df2.plan_id.str.contains(mid)))]
df2[~((df2.planproducts.str.contains(sp)&df2.plan_id.str.contains(mid)))].iloc[:,:3:]

    plan_id planproduct     juliandat
1   4H      4H - MMP         114
2   40      40 - STAR        157
5   5B      C4 - STAR+PLUS   150

用“ in”运算符对熊猫进行布尔索引？

1 个答案:

修改后的列表

使用.join构造字符串

查询