在Wiki上使用BeautifulSoup和python使用“ tr”和“ td”进行搜寻

时间:2019-02-14 23:53:28

标签: python beautifulsoup wiki

此处共有python3初学者。我似乎无法仅打印出大学的名称。 该班级没有靠近大学名称的地方,我似乎无法将find_all缩小到我需要的范围。并打印到新的csv文件。有什么想法吗?

import requests
from bs4 import BeautifulSoup
import csv


res= requests.get("https://en.wikipedia.org/wiki/Ivy_League")
soup = BeautifulSoup(res.text, "html.parser")
colleges = soup.find_all("table", class_ = "wikitable sortable")

for college in colleges:
    first_level = college.find_all("tr")
    print(first_level)

2 个答案:

答案 0 :(得分:4)

您可以使用soup.select()来使用CSS选择器,并且更加精确:

import requests
from bs4 import BeautifulSoup

res= requests.get("https://en.wikipedia.org/wiki/Ivy_League")
soup = BeautifulSoup(res.text, "html.parser")

l = soup.select(".mw-parser-output > table:nth-of-type(2) > tbody > tr > td:nth-of-type(1) a")
for each in l:
    print(each.text)

打印结果:

Brown University
Columbia University
Cornell University
Dartmouth College
Harvard University
University of Pennsylvania
Princeton University
Yale University

要将单个列放入csv:

import pandas as pd
pd.DataFrame([e.text for e in l]).to_csv("your_csv.csv") # This will include index

答案 1 :(得分:1)

使用:

colleges = soup.find_all("table", class_ = "wikitable sortable")

您将获得与此班级的所有表(一共有五张),而不是表中的所有大学。因此,您可以执行以下操作:

import requests
from bs4 import BeautifulSoup

res= requests.get("https://en.wikipedia.org/wiki/Ivy_League")
soup = BeautifulSoup(res.text, "html.parser")

college_table = soup.find("table", class_ = "wikitable sortable")
colleges = college_table.find_all("tr")

for college in colleges:
    college_row = college.find('td')
    college_link = college.find('a')
    if college_link != None:
        college_name = college_link.text
        print(college_name)

编辑:我添加了一个if来丢弃具有表头的第一行