通过抓取字典列表值创建来打印值

时间:2020-03-06 00:04:45

标签: python web-scraping

我想知道如何将某些字符串术语正确地放置在具有值的网站列表中,并通过将其放入函数中并显示值来使其方便地处理。我没有在searchText部分的代码底部构建函数。我无法弄清楚如何将它保存起来并显示在命令窗口中。我将''作为每个值放置的模板。

请让我知道是否需要澄清。谢谢。

import requests
import getpass

# Test for credentials 
cred = str(input('Please enter: '))
username = input('')
password = getpass()

# URLs
url = ''+ cred
secondUrl = '' 

# Data load
load={'user': username, 'pass': password}

# Grabbing from url source
print('Please wait..')
with requests.Session() as session:
  post = session.post(secondUrl, data=load)
  s = session.get(url)

x = ['', '', '', '']
Dict = {}

a = s.text

search = a.split(x)[1]
result = search.split('>')[2]
result = result.split('<')[0]
Dict[x] = result
print(Dict)

1 个答案:

答案 0 :(得分:0)

下面的示例应该使用Python 3.7和BeautifulSoup4库为您提供一些指导和想法。它会抓取以下网页:Sky Sports Premier League(足球)。

enter image description here

它提取团队名称及其分数,并将该数据存储在字典中。

代码:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.skysports.com/premier-league-table'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')

# navigate down to the table and then the table body:
table = soup.find("table", class_='standing-table__table')
body = table.find("tbody")

data = {}

for row in body.find_all("tr"):
    # team name is grabbed from the first <a> value:
    team = row.find("a").get_text() 
    # 10th <td> element contains the points total, as an
    # array of one element - therefore we slice [9::10]
    # and then get the first (and only) array item [0]
    points = row.findAll("td")[9::10][0].get_text()
    data[team] = points
    #print(team)
    #print(points)

print(data)

输出看起来像这样(为清晰起见而格式化):

{
    'Liverpool': '82',
    'Manchester City': '57',
    'Leicester City': '53',
    'Chelsea': '48',
    'Manchester United': '45',
    'Wolverhampton Wanderers': '43',
    'Sheffield United': '43',
    'Tottenham Hotspur': '41',
    'Arsenal': '40',
    'Burnley': '39',
    'Crystal Palace': '39',
    'Everton': '37',
    'Newcastle United': '35',
    'Southampton': '34',
    'Brighton and Hove Albion': '29',
    'West Ham United': '27',
    'Watford': '27',
    'Bournemouth': '27',
    'Aston Villa': '25',
    'Norwich City': '21'
}

但是,要点是,一旦数据进入Python字典(或所需的任何结构),就可以直接进行操作。

这里的主要挑战是了解您要抓取的网站的HTML结构,以便您可以有效地导航HTML标记。在浏览器中使用“查看页面源代码”是一个很好的起点。