查找所有出现的' Php'在页面上忽略案例BeautifulSoup

时间:2017-01-25 18:27:43

标签: python python-3.x beautifulsoup

我希望在[{1}}

中使用F查找页面上Php的所有出现(忽略大小写)

BeautifulSoup(无论如何)可能出现在页面的任何地方,所以我基本上只是找到Python3表示,而不是在特定的div或类中。

我目前有:

Php

string包含来自from BeautifulSoup import BeautifulSoup import requests school_urls = ['somesite1.com','somesite2.com'] posting_keywords = ['PHP', 'Php', 'php'] for school in school_urls: 网址的 html标记,其中包含school等字词。

这对你来说如何?有没有办法在Beautiful soup中执行此操作,找到request忽略案例的所有变体而不必遍历php

由于

2 个答案:

答案 0 :(得分:0)

posting_keywords.lower()是否适合您。

答案 1 :(得分:0)

import re, bs4
text = '''"""
<html><head><title>The Dormouse's story php</title></head>
<body>
<p class="title"><b>The Dormouse's story PHP</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">php</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Php</a> and
<a href="http://example.com/tillie" class="sister" id="link3">php Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""'''
soup = bs4.BeautifulSoup(text, 'lxml')
soup.find_all(text=re.compile(r'php', re.IGNORECASE))

出:

["The Dormouse's story php",
 "The Dormouse's story PHP",
 'php',
 'Php',
 'php Tillie']

Document