Question

我需要将一堆html页面（仅限描述）解析为表格。我怎么能这样做？

我正在尝试：

from bs4 import BeautifulSoup
import requests
r = requests.get('http://images.webofknowledge.com/WOK46/help/WOS/Y_abrvjt.html')
soup = BeautifulSoup(r.content, "html.parser")
dl_data = soup.find_all("dd")
# print(dl_data)
for dlitem in dl_data:
    print(dlitem.string)

这让我觉得“dd”符合预期，神秘，soup.find_all("dt")是yelding None。但是，最重要的是，我需要将它们一起解析，并将其放在一个表中。

请帮助。

This is the SO discussion I have followed

更新：您的意思是：

for dlitem in dl_data:
    print(dlitem)
    print("\n\n\n")

但这并没有改变任何东西（除了，dd标签现在存在）。可能是，我不够明确，但我想要的是，例如，html中的项目：

 </B><DT>YALE JOURNAL OF BIOLOGY AND MEDICINE
<B><DD> YALE J BIOL MED

我想在两个列表中得到它，如：

# ======== COL 1  =========             ====COL 2 =====
YALE JOURNAL OF BIOLOGY AND MEDICINE    YALE J BIOL MED

使用python解析html描述（dt / dd）

0 个答案: