Question

我想将两个正则表达式组合成一行。

 soup1=link.findAll('a', attrs={'href': re.compile('^http://')})
 soup2=link.findAll('a', attrs={'href': re.compile("/news/")})

我尝试(|)以某种方式re.compile('^http://' | '/news/')签名，但一切都是徒劳的。我需要两种功能（链接包含'http'以及/news/）

Answer 1

您不需要正则表达式，您可以使用 css选择器：

 soup.select('a[href^=http://],a[href*=/news/]')

^=查找使用子字符串凝视的href，*=查找包含子字符串的hrefs。

Answer 2

回答这个问题：

我想将两个正则表达式组合成一行...我需要两个功能（包含scp的链接以及'http'）

我理解以及是要求/news/和http出现在字符串中的要求。因此，您可以使用简单的

/news/

它会在开头匹配re.compile(r'^http://.*/news/')，并在字符串内的某处匹配http子字符串。

模式详情：

/news/ - 字符串开头
^ - 一系列文字字符
http:// - 0+任何字符，但换行符
.* - substring /news/。

在或 /news/内 <{em> http获取结果的替代

/news/交替运算符在正则表达式模式中使用，而不是在|内的正则表达式模式之间使用：

re.compile

此处，re.compile(r'^http://|/news/') ^仅属于^（第一个分支）。 http在字符串start -OR-匹配^http://，http://分支在字符串内的任何位置匹配/news。因此，所有值都会匹配，一开始 /news/，或者字符串中的http。

Answer 3

这对我有用

nombre = soup.findAll('a',{'href':re.compile('^http |'+'.'+palabra+'.',flags=re.IGNORECASE)})

Answer 4

试试这个：

re.compile(r'(^http://)|(/news/)')

您尝试的内容几乎正确，re.compile('^http://' | '/news/')，只需将它们放在单引号内：re.compile('^http://|/news/')。

如何在python3中组合两个re.compile正则表达式？

4 个答案: