我有一个python函数
def regex(series, regex):
series = series.str.extract(regex)
series1 = series.dropna()
return (series1)
将正则表达式与以下模式匹配:
任何与' no'接下来是(一组单词)或一个' not'不应该匹配。下面是python函数中使用的正则表达式:
result = regex(df['col'],r'(^(?!.*\bno\b.*\b(text|sample text )\b)(?!.*\b(text|sample text)\b .*not).+$)')
在函数中应用正则表达式时,我没有得到任何结果(只是一个空数据框),
但是测试此链接中的正则表达式效果很好https://regex101.com/r/Epq0Ns/21
答案 0 :(得分:1)
为简单起见,您实际上只需使用列表和列表推导来构建简单的正则表达式模式。
import re
negations = ["no", "not"]
words = ["text", "sample text", "text book", "notebook"]
sentences = [
"first sentence with no and sample text",
"second with a text but also a not",
"third has a no, a text and a not",
"fourth alone is what is neeeded with just text",
"keep putting line here no"
]
for sentence in sentences:
negationsRegex = re.compile(r"\b(?:" + "|".join([re.escape(n) for n in negations]) + r")\b")
wordsRegex = re.compile(r"\b(?:" + "|".join([re.escape(w) for w in words]) + r")\b")
if not (re.search(negationsRegex, sentence) and re.search(wordsRegex, sentence)):
print sentence
以上代码输出:
fourth alone is what is neeeded with just text
keep putting line here no
代码编译正则表达式转义单词的连接列表,确保设置单词边界。生成的正则表达式(给定列表negations
和`单词)如下:
\b(?:no|not)\b
\b(?:text|sample text|text book|notebook)\b
然后if
语句检查两个生成的模式(否定正则表达式和单词正则表达式)是否与句子匹配。如果两个表达式都不匹配(一个或两个不匹配),则返回该字符串。
答案 1 :(得分:0)
尝试使用在regex101上使用的相同标志 - 将函数中的行更改为:
series = series.str.extract(regex, re.M | re.S)
或
series = series.str.extract(regex, flags=re.M|re.S)
如果您有输入定义的代码,我会测试。