我正在尝试使代码仅将1. Get the data of a child object of a child object for a parent-
parent --> child (level1) --> child (level2) (cannot be done with a parent to child relationship query)
2. Know that the parent exists even if the child is null (cannot be done with a child to parent relationship query)
标记之间的所有内容都包含在内。我还没有找到办法。
我尝试使用一个简单的循环,而您应该输入此porgramme网址,并在运行时显示纯文本。
<p>
我还尝试过使用BeutifulSoup,甚至都没有导入它。
答案 0 :(得分:1)
欢迎使用SO和编程。 You can't parse [X]HTML with regex.是时候使用库了。 Beautiful Soup和您的requests是您在这里最好的朋友。
在您的bash / cmd /终端类型中:
pip install requests
pip install beautifulsoup4
然后使用:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://en.wikipedia.org/wiki/Somalia")
soup = BeautifulSoup(r.text) # you need to define the parser but for now its ok.
for p in soup.find_all('p'):
print(p.text)