我想知道如何将某些字符串术语正确地放置在具有值的网站列表中,并通过将其放入函数中并显示值来使其方便地处理。我没有在searchText部分的代码底部构建函数。我无法弄清楚如何将它保存起来并显示在命令窗口中。我将''作为每个值放置的模板。
请让我知道是否需要澄清。谢谢。
import requests
import getpass
# Test for credentials
cred = str(input('Please enter: '))
username = input('')
password = getpass()
# URLs
url = ''+ cred
secondUrl = ''
# Data load
load={'user': username, 'pass': password}
# Grabbing from url source
print('Please wait..')
with requests.Session() as session:
post = session.post(secondUrl, data=load)
s = session.get(url)
x = ['', '', '', '']
Dict = {}
a = s.text
search = a.split(x)[1]
result = search.split('>')[2]
result = result.split('<')[0]
Dict[x] = result
print(Dict)
答案 0 :(得分:0)
下面的示例应该使用Python 3.7和BeautifulSoup4库为您提供一些指导和想法。它会抓取以下网页:Sky Sports Premier League(足球)。
它提取团队名称及其分数,并将该数据存储在字典中。
代码:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.skysports.com/premier-league-table'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
# navigate down to the table and then the table body:
table = soup.find("table", class_='standing-table__table')
body = table.find("tbody")
data = {}
for row in body.find_all("tr"):
# team name is grabbed from the first <a> value:
team = row.find("a").get_text()
# 10th <td> element contains the points total, as an
# array of one element - therefore we slice [9::10]
# and then get the first (and only) array item [0]
points = row.findAll("td")[9::10][0].get_text()
data[team] = points
#print(team)
#print(points)
print(data)
输出看起来像这样(为清晰起见而格式化):
{
'Liverpool': '82',
'Manchester City': '57',
'Leicester City': '53',
'Chelsea': '48',
'Manchester United': '45',
'Wolverhampton Wanderers': '43',
'Sheffield United': '43',
'Tottenham Hotspur': '41',
'Arsenal': '40',
'Burnley': '39',
'Crystal Palace': '39',
'Everton': '37',
'Newcastle United': '35',
'Southampton': '34',
'Brighton and Hove Albion': '29',
'West Ham United': '27',
'Watford': '27',
'Bournemouth': '27',
'Aston Villa': '25',
'Norwich City': '21'
}
但是,要点是,一旦数据进入Python字典(或所需的任何结构),就可以直接进行操作。
这里的主要挑战是了解您要抓取的网站的HTML结构,以便您可以有效地导航HTML标记。在浏览器中使用“查看页面源代码”是一个很好的起点。