TL; DR 我需要将BS4结果集列表(单列)转换为NxN数组,但是如何?我如何获得也是BS4结果集列表的标题?代码如下。感谢您!
所以我试图通过web抓取体育数据,但是我无法将结果集转换为NxN数组。另外,我正在尝试包含以相同方式删除的标头。到目前为止,这是我的代码:
import requests
from bs4 import BeautifulSoup
from __future__ import print_function
import numpy as np
url=input("Paste player link and specific year ")
r= requests.get(url)
html_content=r.text
soup=BeautifulSoup(html_content,"lxml")
body = soup.body
table=body.table
tbody=table.tbody
headers = table.find_all("th")
statistics = tbody.find_all("td")
def string_stats():
for stat in statistics:
print (stat.string)
def string_headers():
for head in headers:
print (head.string)
string_stats_list = string_stats()
string_stats_list
这导致只有td标签元素的垂直列表作为字符串(或者是目标)。
所以,我的问题是:如何将这个单列列表转换为NxN数组/矩阵?另外,如何附加标题?
感谢阅读和/或帮助!
答案 0 :(得分:1)
import pandas as pd
import requests
from bs4 import BeautifulSoup
url='http://www.footballdb.com/players/mike-evans-evansmi03/gamelogs'
r= requests.get(url)
html_content=r.content
soup=BeautifulSoup(html_content,"lxml")
body = soup.body
table=body.table
headers = table.find_all("th")
headers_list = [i.text for i in headers]
string_stats_list = []
row = []
for i in table.select('tr')[1:]:
for j in i.select('td'):
row.append(j.text)
string_stats_list.append(row)
row = []
df = pd.DataFrame(data=string_stats_list, columns=headers_list)