Question

希望从网站中挑选出具体的数据，如价格，公司信息等。幸运的是，网站设计师已经放了很多标签，如

<!-- Begin Services Table -->
' desired data
<!-- End Services Table -->

为了让BS4在给定标签之间返回字符串，我需要什么样的代码？

import requests
from bs4 import BeautifulSoup

url = "http://www.100ll.com/searchresults.phpclear_previous=true&searchfor="+'KPLN'+"&submit.x=0&submit.y=0"

response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")

text_list = soup.find(id="framediv").find_all(text=True)
start_index = text_list.index(' Begin Fuel Information Table ') + 1
end_index = text_list.index(' End Fuel Information Table ')
for item in text_list[start_index:end_index]:
    print(item)

以下是有问题的网站：

http://www.100ll.com/showfbo.php?HashID=cf5f18404c062da6fa11e3af41358873

Answer 1

如果要在这些特定注释后选择table元素，则可以选择所有注释节点，根据所需文本过滤它们，然后选择下一个兄弟{{1元素：

table

或者，如果您想在这两个注释之间获取所有数据，那么您可以找到第一个注释，然后遍历所有下一个兄弟，直到找到结束注释：

import requests
from bs4 import BeautifulSoup
from bs4 import Comment

response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")

comments = soup.find_all(string=lambda text:isinstance(text,Comment))

for comment in comments:
    if comment.strip() == 'Begin Services Table':
        table = comment.find_next_sibling('table')
        print(table)

Python - 使用BeautifulSoup 4在特定注释节点之间提取数据

1 个答案: