我如何刮擦这个标签?

时间:2020-07-21 04:42:26

标签: python python-3.x web-crawler

  <div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>

这是我要刮擦的标签,我要打印1/2 and 2/1 are reciprocals.

我将通过get_text()打印它,但是我不知道如何刮取标签。

我可以做到。

find_all({"class":"hide-editing-3453658"}

但是要抓取的标签更多,并且在“ high-editing-”之后它们具有不同的编号

我在数字中找不到任何规则。

有人可以帮助我吗?

2 个答案:

答案 0 :(得分:1)

该属性为id而不是class,并且已在find_all方法中提供了要查找的标记。您可以使用regex查找具有特定模式的所有元素。

In [61]: import re
In [62]: a = """  <div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>
    ...:    <div id="hide-editing-345258">1/4 and 2/1 are reciprocals.</div>
    ...:   <div id="hide-editing-346258">1/5 and 2/1 are reciprocals.</div>
    ...: """

In [63]: soup = BeautifulSoup(a, "html.parser")

In [64]: all_divs = dates = soup.findAll("div", {"id" : re.compile('hide-editing.*')})

In [65]: all_divs
Out[65]:
[<div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>,
 <div id="hide-editing-345258">1/4 and 2/1 are reciprocals.</div>,
 <div id="hide-editing-346258">1/5 and 2/1 are reciprocals.</div>]

In [66]: [i.text.strip() for i in all_divs]
Out[66]:
['1/2 and 2/1 are reciprocals.',
 '1/4 and 2/1 are reciprocals.',
 '1/5 and 2/1 are reciprocals.']

答案 1 :(得分:0)

也许您可以尝试使用正则表达式?

__cplusplus = 201402