我想在解析的HTML中包含这些元素:
<td class="line"> GARBAGE </td>
<td class="line text"> I WANT THAT </td>
<td class="line heading"> I WANT THAT </td>
<td class="line"> GARBAGE </td>
如何创建一个CSS选择器来选择具有属性class line和class something的元素(可能是标题,文本或其他任何东西)但是不仅仅属性类行?
我试过了:
td[class=line.*]
td.line.*
td[class^=line.]
修改
我正在使用Python和BeautifulSoup:
url = 'http://www.somewebsite'
res = requests.get(url)
res.raise_for_status()
DicoSoup = bs4.BeautifulSoup(res.text, "lxml")
elems = DicoSoup.select('body div#someid tr td.line')
我正在考虑修改最后一块,即td.line到td.line.whateverotherclass
之类的东西(但不是单独的td.line,否则我的选择器已经足够了)
答案 0 :(得分:3)
@BoltClock suggested通常是使用CSS选择器解决问题的正确方法。唯一的问题是BeautifulSoup
supports a limited number of CSS selectors。例如,not()
selector is :not(.supported) at the moment。
你可以使用&#34; starts-with&#34;来解决它。选择器检查一个类是否以line
开头,后跟一个空格(它非常脆弱但可以处理您的样本数据):
for td in soup.select("td[class^='line ']"):
print(td.get_text(strip=True))
或者,您可以使用find_all()
解决问题,并searching function检查class
属性是否有line
和其他类:
from bs4 import BeautifulSoup
data = """
<table>
<tr>
<td class="line"> GARBAGE </td>
<td class="line text"> I WANT THAT </td>
<td class="line heading"> I WANT THAT </td>
<td class="line"> GARBAGE </td>
</tr>
</table>"""
soup = BeautifulSoup(data, 'html.parser')
for td in soup.find_all(lambda tag: tag and tag.name == "td" and
"class" in tag.attrs and "line" in tag["class"] and
len(tag["class"]) > 1):
print(td.get_text(strip=True))
打印:
I WANT THAT
I WANT THAT
答案 1 :(得分:0)
您可以为类选择器链接CSS类。
.line {
color: green;
}
.line.text {
color: red;
}
.line.heading {
color: blue;
}
&#13;
<p class="line">GARBAGE</p>
<p class="line text">I WANT THAT</p>
<p class="line heading">I WANT THAT</p>
<p class="line">GARBAGE</p>
&#13;