熊猫中的多个字符串条件

时间:2020-04-20 18:43:33

标签: python regex pandas dataframe

enter image description here

我想验证“说明”列中是否存在子字符串。如果为“ true”,则在“结果”列中写一些内容。 如果我有一个条件,我的代码就可以工作

<button onclick='CreateModal()'>LOGIN </button>

df.loc[df.index[df.description.str.contains('ab',flags=re.I, regex=True)],'result']='found ab'

但不适用于“与”条件

df.loc[df.index[df.description.str.contains('d|f',flags=re.I, regex=True)],'result']='found d or f'

如果我这样写,就可以了,但是太长了

df.loc[df.index[df.description.str.contains('d&f',flags=re.I, regex=True)],'result']='found d and f'

最后,对于以下情况,是否有更好的代码?

df.loc[(df.index[df.description.str.contains('d',flags=re.I, regex=True)] & df.index[df.description.str.contains('f',flags=re.I, regex=True))] ,'result']='found d&f'

1 个答案:

答案 0 :(得分:0)

要匹配AND条件,可以使用以下正则表达式:

(?:d)\w*f|(?:f)\w*d

详细信息:

  • (?:d):非捕获组-逐字匹配字符d
  • \w*:0+个字母/数字/下划线
  • f:从字面上匹配字符f
  • |:或(或查找fd之前的时间)
  • (?:f):非捕获组-逐字匹配字符f
  • \w*:0+个字母/数字/下划线
  • d:从字面上匹配字符d
import pandas as pd
import re

df = pd.DataFrame(
    {"description": ["abc", "def", "hjk", "lmno", "dxx", "fxx", "fxd"]}
)

reg_list = [
    ("ab", "found ab"),
    ("d|f", "found d OR f"),
    ("(?:d)\w*f|(?:f)\w*d", "found d AND f"),
    ("l|m|n|o", "found l|m|n|o"),
]

for r in reg_list:
    df.loc[df.index[df.description.str.contains(r[0], flags=re.I, regex=True)], 'result'] = r[1]

print(df)
  description         result
0         abc       found ab
1         def  found d AND f
2         hjk            NaN
3        lmno  found l|m|n|o
4         dxx   found d OR f
5         fxx   found d OR f
6         fxd  found d AND f