Question

我是Web抓取的新手，我想知道，除了找到所需的标签并回溯其每个父标签之外，还有更好的方法吗？从哪里body到我们所需标签的所有标签都可以在哪里找到？

Answer 1

如果我理解您的问题，那么使用beautifulsoup是最好的方法。（在python中）

from bs4 import BeautifulSoup

//parse html using BeautifulSoup
doc = BeautifulSoup("html link", features="lxml")//add .getText() for no tags

//loop through all lines in body (including tags)
for d in doc:
     print(d)

然后您可以用d.find（“ tag”）替换print语句，以找到标签所在的位置并获取信息。

我认为有一种更好的方法是只使用硒find_element_by_xpath

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("link")

element = driver.find_element_by_xpath("your xpath")

希望这会有所帮助，欢迎使用网络自动化，这是一个有趣的世界！

是否可以一次找到html标记的所有父标记以进行网络抓取？

1 个答案: