我正在尝试解析页面中存在的所有查询字符串,因此使用该查询字符串我可以导航到特定页面。我尝试这样做的代码如下所示
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import datetime
import dateutil.parser
import time
import pytz
"""python espncricinfo library module https://github.com/dwillis/python-espncricinfo """
from espncricinfo.match import Match
from espncricinfo.exceptions import MatchNotFoundError, NoScorecardError
"""----time-zone-calculation----"""
time_zone = pytz.timezone("Asia/Kolkata")
datetime_today = datetime.datetime.now(time_zone)
datestring_today = datetime_today.strftime("%Y-%m-%d")
"""------URL of page to parse-------with a date of today-----"""
url = "http://www.espncricinfo.com/ci/engine/match/index.html?date=datestring_today"
"""eg. url = http://www.espncricinfo.com/ci/engine/match/index.html?date=2018-02-12"""
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
""""------parsing for matchno------"""
match_no = [x['href'].split('/',4)[4].split('.')[0] for x in
soup.findAll('a', href=True, text='Scorecard')]
for p in match_no:
""" where p is a match no, e.g p = '1122282'"""
m = Match(p)
m.latest_batting
print(m.latest_batting)
当我打印match_no时,我得到输出:
['8890/scorecard/1118760/andhra-vs-tamil-nadu-group-c-vijay-hazare-trophy-2017-18/', '8890/scorecard/1118743/assam-vs-odisha-group-a-vijay-hazare-trophy-2017-18/', '8890/scorecard/1118745/bengal-vs-delhi-group-b-vijay-hazare-trophy-2017-18/', '8890/scorecard/1118763/chhattisgarh-vs-vidarbha-group-d-vijay-hazare-trophy-2017-18/']
这个页面(http://www.espncricinfo.com/ci/engine/match/index.html?date=datestring_today“)包含当天发生的所有比赛的match_no,我想修剪这个以得到match_no这是7位数[1118743,1118743.1118745 ....],我该怎么办?这样做?所以使用match_no我可以将它传递给Match(),这样我就可以获得当天发生的特定匹配的详细信息。 PS如果新的一天没有匹配,那么match_no将返回无。
答案 0 :(得分:0)
首先,您的代码很难阅读。你需要让你的代码呼吸并使其吸引其他人阅读它。
其次,导致问题的原因可能是这一行:
match_no = [x['href'].split('/',4)[4].split('.')[0] for x in soup.findAll('a', href=True, text='Scorecard')]
也难以阅读。从URL中解析匹配id有更好,更可读的方法。
这是应该工作的例子。我确实把比赛暂定日期:
import re
import pytz
import requests
import datetime
from bs4 import BeautifulSoup
from espncricinfo.exceptions import MatchNotFoundError, NoScorecardError
from espncricinfo.match import Match
"""python espncricinfo library module https://github.com/dwillis/python-espncricinfo """
# from espncricinfo.match import Match
def get_match_id(link):
match_id = re.search(r'([0-9]{7})', link)
if match_id is None:
return None
return match_id.group()
# ----time-zone-calculation----
time_zone = pytz.timezone("Asia/Kolkata")
datetime_today = datetime.datetime.now(time_zone)
datestring_today = datetime_today.strftime("%Y-%m-%d")
# ------URL of page to parse-------with a date of today-----
url = "http://www.espncricinfo.com/ci/engine/match/index.html?date=datestring_today"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
spans = soup.findAll('span', {"class": "match-no"})
matches_ids = []
for s in spans:
for a in s.findAll('a', href=lambda href: 'scorecard' in href):
match_id = get_match_id(a['href'])
if match_id is None:
continue
matches_ids.append(match_id)
# ------parsing for matchno------
for p in matches_ids:
# where p is a match no, e.g p = '1122282'
m = Match(p)
m.latest_batting
print(m.latest_batting)
现在,我没有你在这里使用的每个lib,但这应该让你知道如何做到这一点。
我的建议再一次,空行是你的朋友。他们肯定是读者的朋友。让你的代码'呼吸'。