此处是Python新手。我正在尝试格式化导入的大学橄榄球比分(根据梅西评分),因此可以将其导入Excel。我需要创建一些标题[“ Date”,“ Winner”,“ Score”,“ Loser”,“ Score”],并在各列之间添加一些空间以提高可读性。从我的收集中可以得出Pandas DataFrame。任何帮助将不胜感激。
到目前为止,这是我的代码:
import pandas as pd
from bs4 import BeautifulSoup
import urllib.request
address = 'https://www.masseyratings.com/scores.php?s=308075&sub=11604&dt=20191119'
response = urllib.request.urlopen(address)
html = response.read()
soup = BeautifulSoup(html,"html.parser")
table = soup.find("pre").get_text(strip=True)
print(table)
我得到的输出:
2019-11-16Southern Miss36 @UT San Antonio17
2019-11-16 @Washington St49Stanford22
2019-11-16TCU33 @Texas Tech31
2019-11-16 @Temple29Tulane21
2019-11-16Troy63 @Texas St27
2019-11-16 @UAB37UTEP10
2019-11-16 @Utah49UCLA3
2019-11-16 @Utah St26Wyoming21
2019-11-16 @Clemson52Wake Forest3
2019-11-16 @Florida St49Alabama St12
2019-11-16Virginia Tech45 @Georgia Tech0
2019-11-16Ohio St56 @Rutgers21
2019-11-16 @Iowa St23Texas21
2019-11-16 @BYU42Idaho St10
2019-11-19Ohio0 @Bowling Green0 Sch
2019-11-19E Michigan0 @N Illinois0 Sch
答案 0 :(得分:0)
字符串拆分可能是一个好主意,但是您可以在此特定页面上使用正则表达式模式来提取4列
import re, csv, requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.masseyratings.com/scores.php?s=308075&sub=11604&dt=20191119')
soup = bs(r.content, 'lxml')
p = re.compile(r'([^0-9-]+)\s{3,}')
p2 = re.compile(r'\s(\d+)\s')
with open("data.csv", "w", encoding="utf-8-sig", newline='') as csv_file:
w = csv.writer(csv_file, delimiter = ",", quoting=csv.QUOTE_MINIMAL)
w.writerow(['Date','Winner','Score1','Loser','Score2'])
for line in soup.select_one('pre').text.split('\n')[:-4]:
matches1 = p.findall(line)
matches2 = p2.findall(line)
row = [re.search(r'(\d{4}-\d{2}-\d{2})',line).group(0), matches1[0].strip(), matches2[0], matches1[1].strip(), matches2[1]]
w.writerow(row)