我试图分别从以下网站解析班级:fixture_date
和班级:play_team
。
http://www.espncricinfo.com/ci/content/series/1128817.html?template=fixtures
代码:
import re
import pytz
import requests
import datetime
from bs4 import BeautifulSoup
from espncricinfo.exceptions import MatchNotFoundError, NoScorecardError
from espncricinfo.match import Match
bigbash_article_link = "http://www.espncricinfo.com/ci/content/series/1128817.html?template=fixtures"
r = requests.get(bigbash_article_link)
bigbash_article_html = r.text
soup = BeautifulSoup(bigbash_article_html, "html.parser")
bigbash1_items = soup.find_all("span",{"class": "fixture_date"})
#print(bigbash1_items)
bigbash_items = soup.find_all("span",{"class": "play_team"})
date = {}
team = {}
for div in bigbash_items:
team = [div.find('a').string.strip("\n\r")]
print(team)
for div in bigbash1_items:
date = [div.string.strip("\xa0local\n\r\t")]
print(date)
输出:
['1st Match - Peshawar Zalmi v Multan Sultans']
['2nd Match - Karachi Kings v Quetta Gladiators']
['3rd Match - Multan Sultans v Lahore Qalandars']
['4th Match - Islamabad United v Peshawar Zalmi']
['5th Match - Quetta Gladiators v Lahore Qalandars']
['6th Match - Multan Sultans v Islamabad United']
['7th Match - Karachi Kings v Peshawar Zalmi']
['8th Match - Karachi Kings v Lahore Qalandars']
['9th Match - Islamabad United v Quetta Gladiators']
['10th Match - Quetta Gladiators v Peshawar Zalmi']
['11th Match - Multan Sultans v Karachi Kings']
['12th Match - Lahore Qalandars v Islamabad United']
['13th Match - Multan Sultans v Quetta Gladiators']
['14th Match - Peshawar Zalmi v Lahore Qalandars']
['15th Match - Islamabad United v Karachi Kings']
['16th Match - Peshawar Zalmi v Multan Sultans']
['17th Match - Multan Sultans v Quetta Gladiators']
['18th Match - Islamabad United v Lahore Qalandars']
['19th Match - Karachi Kings v Quetta Gladiators']
['20th Match - Multan Sultans v Lahore Qalandars']
['21st Match - Peshawar Zalmi v Islamabad United']
['22nd Match - Multan Sultans v Karachi Kings']
['23rd Match - Peshawar Zalmi v Quetta Gladiators']
['24th Match - Karachi Kings v Lahore Qalandars']
['25th Match - Multan Sultans v Islamabad United']
['26th Match - Quetta Gladiators v Lahore Qalandars']
['27th Match - Peshawar Zalmi v Karachi Kings']
['28th Match - Quetta Gladiators v Islamabad United']
['29th Match - Peshawar Zalmi v Lahore Qalandars']
['30th Match - Islamabad United v Karachi Kings']
['Qualifier - TBC v TBC']
['Eliminator 1 - TBC v TBC']
['Eliminator 2 - TBC v TBC']
['Final - TBC v TBC']
['Thu Feb 22']
['21:00']
['Fri Feb 23']
['15:30']
['Fri Feb 23']
['20:00']
['Sat Feb 24']
['15:30']
['Sat Feb 24']
['20:00']
['Sun Feb 25']
['15:30']
['Sun Feb 25']
['20:00']
['Mon Feb 26']
['20:00']
['Wed Feb 28']
['20:00']
['Thu Mar 1']
['20:00']
['Fri Mar 2']
['15:30']
['Fri Mar 2']
['20:00']
['Sat Mar 3']
['15:30']
['Sat Mar 3']
['20:00']
['Sun Mar 4']
['20:00']
['Tue Mar 6']
['20:00']
['Wed Mar 7']
['20:00']
['Thu Mar 8']
['15:30']
['Thu Mar 8']
['20:00']
['Fri Mar 9']
['15:30']
['Fri Mar 9']
['20:00']
['Sat Mar 10']
['15:30']
['Sat Mar 10']
['20:00']
['Sun Mar 11']
['20:00']
['Tue Mar 13']
['20:00']
['Wed Mar 14']
['20:00']
['Thu Mar 15']
['15:30']
['Thu Mar 15']
['20:00']
['Fri Mar 16']
['15:30']
['Fri Mar 16']
['20:00']
['Sun Mar 18']
['20:00']
['Tue Mar 20']
['Wed Mar 21']
['Sun Mar 25']
我想将这些值存储在字典列表中,如
预期产出:
[{'team':'1st Match - Peshawar Zalmi v Multan Sultans','date':'Thu Feb 22', 'time':'21:00'}
{'team':'2nd Match - Karachi Kings v Quetta Gladiators','date':'Thu Feb 23', 'time':'15:30'}
{'team':'3rd Match - Multan Sultans v Lahore Qalandars','date':'Thu Feb 24', 'time':'20:00'}
.....{'team':'Eliminator 1 - TBC v TBC','date':'Wed Mar 21', 'time':''}{'team':'Final - TBC v TBC','date':'Sun Mar 25', 'time':''}]
问题是date = {}包含日期和时间值的单独列表,我该怎么做?
答案 0 :(得分:0)
此代码解析下载的"灯具"您可以在您提供的网址顶部下载的文件。我知道这似乎不是一种优先考虑的方式,但信息似乎是最新的。例如,该网站显示似乎已经播放的比赛(从2月开始),但.ics
文件以明天(3月2日)播放的比赛开始。
>>> import re
... from datetime import datetime
...
... REGEX = re.compile(r'''\
... SUMMARY:(?P<team>.+)\n
... DTSTART:(?P<start>.+)\n
... DTEND:(?P<end>.+)\n
... LOCATION:(?P<location>.+)\n''', re.VERBOSE)
...
...
... def to_datetime(s):
... return datetime.strptime(s, '%Y%m%dT%H%M00Z')
...
...
... result = []
... with open('Pakistan_Super_League.ics', 'r') as f:
... for m in REGEX.finditer(f.read()):
... current = m.groupdict()
... start = to_datetime(current['start'])
... result.append({
... 'team': current['team'],
... 'date': start.strftime('%a %b %d'),
... 'time': start.strftime('%H:%M')
... })
...
>>> for event in result:
... print(event)
...
{'team': '11th Match Multan Sultans v Karachi Kings', 'date': 'Fri Mar 02', 'time': '11:30'}
{'team': '12th Match Lahore Qalandars v Islamabad United', 'date': 'Fri Mar 02', 'time': '16:00'}
{'team': '13th Match Multan Sultans v Quetta Gladiators', 'date': 'Sat Mar 03', 'time': '11:30'}
{'team': '14th Match Peshawar Zalmi v Lahore Qalandars', 'date': 'Sat Mar 03', 'time': '16:00'}
{'team': '15th Match Islamabad United v Karachi Kings', 'date': 'Sun Mar 04', 'time': '16:00'}
{'team': '16th Match Peshawar Zalmi v Multan Sultans', 'date': 'Tue Mar 06', 'time': '16:00'}
{'team': '17th Match Multan Sultans v Quetta Gladiators', 'date': 'Wed Mar 07', 'time': '16:00'}
{'team': '18th Match Islamabad United v Lahore Qalandars', 'date': 'Thu Mar 08', 'time': '11:30'}
{'team': '19th Match Karachi Kings v Quetta Gladiators', 'date': 'Thu Mar 08', 'time': '16:00'}
{'team': '20th Match Multan Sultans v Lahore Qalandars', 'date': 'Fri Mar 09', 'time': '11:30'}
{'team': '21st Match Peshawar Zalmi v Islamabad United', 'date': 'Fri Mar 09', 'time': '16:00'}
{'team': '22nd Match Multan Sultans v Karachi Kings', 'date': 'Sat Mar 10', 'time': '11:30'}
{'team': '23rd Match Peshawar Zalmi v Quetta Gladiators', 'date': 'Sat Mar 10', 'time': '16:00'}
{'team': '24th Match Karachi Kings v Lahore Qalandars', 'date': 'Sun Mar 11', 'time': '16:00'}
{'team': '25th Match Multan Sultans v Islamabad United', 'date': 'Tue Mar 13', 'time': '16:00'}
{'team': '26th Match Quetta Gladiators v Lahore Qalandars', 'date': 'Wed Mar 14', 'time': '16:00'}
{'team': '27th Match Peshawar Zalmi v Karachi Kings', 'date': 'Thu Mar 15', 'time': '11:30'}
{'team': '28th Match Quetta Gladiators v Islamabad United', 'date': 'Thu Mar 15', 'time': '16:00'}
{'team': '29th Match Peshawar Zalmi v Lahore Qalandars', 'date': 'Fri Mar 16', 'time': '11:30'}
{'team': '30th Match Islamabad United v Karachi Kings', 'date': 'Fri Mar 16', 'time': '16:00'}
{'team': 'Qualifier TBD v TBD', 'date': 'Sun Mar 18', 'time': '16:00'}
{'team': 'Eliminator 1 TBD v TBD', 'date': 'Tue Mar 20', 'time': '00:00'}
{'team': 'Eliminator 2 TBD v TBD', 'date': 'Wed Mar 21', 'time': '00:00'}
{'team': 'Final TBD v TBD', 'date': 'Sun Mar 25', 'time': '00:00'}
答案 1 :(得分:0)
如果您快速查看被检查的元素,每行(每个夹具)都出现在以下标记内:
<li class="large-20 medium-20 columns" team1="xxxx" team2="xxxx" venue="xxxx">
所以,你可以迭代它并在每个循环中获得团队,日期和时间。
import requests
from bs4 import BeautifulSoup
r = requests.get('http://www.espncricinfo.com/ci/content/series/1128817.html?template=fixtures')
soup = BeautifulSoup(r.text, 'lxml')
fixtures = []
for row in soup.find_all('li', class_='large-20 medium-20 columns'):
team = row.find('span', class_='play_team').a.text.strip('\n\r')
date_and_time = row.find_all('span', class_='fixture_date')
date = date_and_time[0].text.strip()
try:
time = date_and_time[1].text.strip('\xa0local\n\r\t')
except IndexError:
time = ''
fixtures.append({'team': team, 'date': date, 'time': time})
for f in fixtures:
print(f)
输出:
{'team': '1st Match - Peshawar Zalmi v Multan Sultans', 'date': 'Thu Feb 22', 'time': '21:00'}
{'team': '2nd Match - Karachi Kings v Quetta Gladiators', 'date': 'Fri Feb 23', 'time': '15:30'}
{'team': '3rd Match - Multan Sultans v Lahore Qalandars', 'date': 'Fri Feb 23', 'time': '20:00'}
{'team': '4th Match - Islamabad United v Peshawar Zalmi', 'date': 'Sat Feb 24', 'time': '15:30'}
{'team': '5th Match - Quetta Gladiators v Lahore Qalandars', 'date': 'Sat Feb 24', 'time': '20:00'}
{'team': '6th Match - Multan Sultans v Islamabad United', 'date': 'Sun Feb 25', 'time': '15:30'}
{'team': '7th Match - Karachi Kings v Peshawar Zalmi', 'date': 'Sun Feb 25', 'time': '20:00'}
{'team': '8th Match - Karachi Kings v Lahore Qalandars', 'date': 'Mon Feb 26', 'time': '20:00'}
{'team': '9th Match - Islamabad United v Quetta Gladiators', 'date': 'Wed Feb 28', 'time': '20:00'}
{'team': '10th Match - Quetta Gladiators v Peshawar Zalmi', 'date': 'Thu Mar 1', 'time': '20:00'}
{'team': '11th Match - Multan Sultans v Karachi Kings', 'date': 'Fri Mar 2', 'time': '15:30'}
{'team': '12th Match - Lahore Qalandars v Islamabad United', 'date': 'Fri Mar 2', 'time': '20:00'}
{'team': '13th Match - Multan Sultans v Quetta Gladiators', 'date': 'Sat Mar 3', 'time': '15:30'}
{'team': '14th Match - Peshawar Zalmi v Lahore Qalandars', 'date': 'Sat Mar 3', 'time': '20:00'}
{'team': '15th Match - Islamabad United v Karachi Kings', 'date': 'Sun Mar 4', 'time': '20:00'}
{'team': '16th Match - Peshawar Zalmi v Multan Sultans', 'date': 'Tue Mar 6', 'time': '20:00'}
{'team': '17th Match - Multan Sultans v Quetta Gladiators', 'date': 'Wed Mar 7', 'time': '20:00'}
{'team': '18th Match - Islamabad United v Lahore Qalandars', 'date': 'Thu Mar 8', 'time': '15:30'}
{'team': '19th Match - Karachi Kings v Quetta Gladiators', 'date': 'Thu Mar 8', 'time': '20:00'}
{'team': '20th Match - Multan Sultans v Lahore Qalandars', 'date': 'Fri Mar 9', 'time': '15:30'}
{'team': '21st Match - Peshawar Zalmi v Islamabad United', 'date': 'Fri Mar 9', 'time': '20:00'}
{'team': '22nd Match - Multan Sultans v Karachi Kings', 'date': 'Sat Mar 10', 'time': '15:30'}
{'team': '23rd Match - Peshawar Zalmi v Quetta Gladiators', 'date': 'Sat Mar 10', 'time': '20:00'}
{'team': '24th Match - Karachi Kings v Lahore Qalandars', 'date': 'Sun Mar 11', 'time': '20:00'}
{'team': '25th Match - Multan Sultans v Islamabad United', 'date': 'Tue Mar 13', 'time': '20:00'}
{'team': '26th Match - Quetta Gladiators v Lahore Qalandars', 'date': 'Wed Mar 14', 'time': '20:00'}
{'team': '27th Match - Peshawar Zalmi v Karachi Kings', 'date': 'Thu Mar 15', 'time': '15:30'}
{'team': '28th Match - Quetta Gladiators v Islamabad United', 'date': 'Thu Mar 15', 'time': '20:00'}
{'team': '29th Match - Peshawar Zalmi v Lahore Qalandars', 'date': 'Fri Mar 16', 'time': '15:30'}
{'team': '30th Match - Islamabad United v Karachi Kings', 'date': 'Fri Mar 16', 'time': '20:00'}
{'team': 'Qualifier - TBC v TBC', 'date': 'Sun Mar 18', 'time': '20:00'}
{'team': 'Eliminator 1 - TBC v TBC', 'date': 'Tue Mar 20', 'time': ''}
{'team': 'Eliminator 2 - TBC v TBC', 'date': 'Wed Mar 21', 'time': ''}
{'team': 'Final - TBC v TBC', 'date': 'Sun Mar 25', 'time': ''}