Question

我正在报废报纸，以获取＆lt;＆gt;下的通知正文。 p为H.标签我把所有的＆＃34;＆lt;＆lt; p＆gt;＆＃34;标签，但我需要排除＆＃34;＆lt; p为H. ＆LT; div class =＆＃34; L video＆＃34;＆gt;＆＃34;标记因为在该标记下HTML包含我不需要的信息。一个选项是使用正则表达式，所以我在https://regexr.com/中测试了排除组（＆lt; p＆gt;）[^（＆lt; p＆gt;＆lt; div）]并且显然工作正常，但我无法做到在我的代码中使用Beautiful Soup正常工作。

url = 'https://www.lanacion.com.ar/2141182-mundial-rusia-2018-gritos-el-temor-de-messi-y-una-marcha-atras-imparable-las-razones-detras-de-otro-dia-de-furia-en-la-seleccion'
resp = requests.get(url)
excl="<p> <div"
soup = BeautifulSoup(resp.text, 'html.parser')
body=soup.findAll('article',{'floatFix'})
for p in body:
    text = p.find_all("p")
    for p in text:
        print(p.text)

美丽的汤findall与排除groupon

0 个答案: