我正在尝试从this维基百科页面上抓取数据。
以下是我当前正在使用的代码。
代码:
from bs4 import BeautifulSoup
import urllib.request
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
soup = make_soup("https://en.wikipedia.org/wiki/2015_in_hip_hop_music")
albumdatasaved = ""
for record in soup.findAll('tr'):
albumdata = ""
for data in record.findAll('td'):
albumdata = albumdata + "," + data.text
albumdatasaved = albumdatasaved + "\n" + albumdata[1:]
print(albumdatasaved)
我只需要每个表的第一行数据,如下图所示。我该怎么做?
答案 0 :(得分:0)
这里有完整的代码可以解决您的问题,使用API是更好的方法,但我知道您需要针对此问题的快速解决方案...
from bs4 import BeautifulSoup
import urllib.request
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
soup = make_soup("https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains")
albumdatasaved = ""
for record in soup.findAll('tr'):
for data in record.findAll('td'):
if data.text.strip() and data.text[0] == ".":
albumdatasaved += data.text.strip() + "\n"
break
print(albumdatasaved)