我已经开始抓取网页,但是我的文字上附加了数字0.01。
例如,我希望名称“ Doe,John0.01”看起来像“ Doe,John”。
这是到目前为止的代码...
from urllib.request import urlopen
from lxml import html
response = urlopen("https://www.baseball-reference.com/leagues/MLB/2018-standard-pitching.shtml")
content = response.read()
tree = html.fromstring(content)
comment_html = tree.xpath('//comment([contains(.,"players_standard_pitching")]'[0]
comment_html = str(comment_html).replace("-->", "")
comment_html = comment_html.replace("<!--", "")
tree = html.fromstring(comment_html)
for pitcher_row in tree.xpath('//table[@id="players_standard_pitching"]/tbody/tr[contains(@class, "full_table")]'):
csk = pitcher_row.xpath('./td[@data-stat="player"]/@csk')[0]
print(csk)