在python中通过网络抓取获取表

时间:2020-09-16 16:50:44

标签: python web-scraping beautifulsoup python-requests urllib

import requests
from bs4 import BeautifulSoup


url = 'https://www.universitego.com/bilgisayar-muhendisligi-2021-taban-puanlari-ve-basari-siralamalari/'
soup = BeautifulSoup(requests.get(url).content.decode('utf-8', 'ignore'), 'html.parser')

for span in soup.select('tr > td:nth-child(1)'):
    print(span.get_text(strip=True, separator=' '))
    print('-' * 80)

我使用上面的代码从下面的网站获取部门和有关部门的表格。但是,跑步后我得到了一个空列表。我该怎么办?谢谢。

网站 https://www.universitego.com/4-yillik-bolumlerin-2015-2016-taban-puanlari-ve-basari-siralamalari/ https://www.universitego.com/bilgisayar-muhendisligi-2021-taban-puanlari-ve-basari-siralamalari/

1 个答案:

答案 0 :(得分:0)

你去哪里

from bs4 import BeautifulSoup
from requests import get
r=get('https://www.universitego.com/bilgisayar-muhendisligi-2021-taban-puanlari-ve-basari-siralamalari/')
soup=BeautifulSoup(r.content, features='lxml')
resulting_list_of_dicts=[]
keys=soup.find('table').find('tbody').findAll('tr')[0].text.split('\n')

for values in  [i.text for i in soup.find('table').find('tbody').findAll('tr')[1:]]:
    resulting_list_of_dicts.append(dict(zip(keys,values.split('\n'))))

resulting_list_of_dicts[0]
'Üniversite Adı':'BOĞAZİÇİ ÜNİVERSİTESİ (İSTANBUL) (Devlet Üniversitesi)'
'Bölüm':'Bilgisayar Mühendisliği (İngilizce)'
'Puan Türü':'SAY'
'Kont.':'85'
'Taban Puanı':'546,34716'
'Başarı Sırası':'643'