Question

我抓取了以下数据：

for row in stat_table.find_all("tr"):
    for cell in row.find_all('td'):
        print(cell.text)

输出看起来像这样： 1个 2019-10-24 31-206 密尔 @ OU 宽（+6） 0 16:35 1个 3 .333 0 2

等

我创建了一个列变量：

columns = ['G','Date', 'Age','Team',"at","Opp",'Score','Starter','MP','FG','FGA','FG%','3P','3PA',"3P%",
           'FT','FTA','FT%','ORB','DRB','TRB','AST','STL','BLK','TOV','PF','PTS',"GmSC","+/-"]

我想读入输出并使用这些列创建一个新的pandas数据框。知道我该怎么读吗？

Answer 1

我要这样做的方法是拆分文本，以使其成为for循环中的列表，并将其附加到列表列表（body）：

header = [**your column names**]
body = [] # list of lists

for row in stat_table.find_all("tr"):
    for cell in row.find_all('td'):
        body.append(cell.text.split(' ')) # splitting on space

然后，确保header和body中的列表长度相等，并且：

df = pd.DataFrame(data=body, columns=header)

从连续的废料中填充新的熊猫数据框，已知列名

1 个答案: