将BS4结果集转换为相对于标头的NxN数组(单独的BS4结果集)

时间:2017-07-02 23:52:21

标签: python arrays beautifulsoup resultset

TL; DR 我需要将BS4结果集列表(单列)转换为NxN数组,但是如何?我如何获得也是BS4结果集列表的标题?代码如下。感谢您!

所以我试图通过web抓取体育数据,但是我无法将结果集转换为NxN数组。另外,我正在尝试包含以相同方式删除的标头。到目前为止,这是我的代码:

import requests
from bs4 import BeautifulSoup
from __future__ import print_function
import numpy as np

url=input("Paste player link and specific year ")
r= requests.get(url)
html_content=r.text
soup=BeautifulSoup(html_content,"lxml")

body = soup.body
table=body.table
tbody=table.tbody

headers = table.find_all("th")
statistics = tbody.find_all("td")

def string_stats():
    for stat in statistics:
        print (stat.string)

def string_headers():
    for head in headers:
        print (head.string)

string_stats_list = string_stats()
string_stats_list

这导致只有td标签元素的垂直列表作为字符串(或者是目标)。

所以,我的问题是:如何将这个单列列表转换为NxN数组/矩阵?另外,如何附加标题?

感谢阅读和/或帮助!

1 个答案:

答案 0 :(得分:1)

import pandas as pd
import requests
from bs4 import BeautifulSoup

url='http://www.footballdb.com/players/mike-evans-evansmi03/gamelogs'
r= requests.get(url)
html_content=r.content
soup=BeautifulSoup(html_content,"lxml")

body = soup.body
table=body.table

headers = table.find_all("th")

headers_list = [i.text for i in headers]

string_stats_list = []
row = []
for i in table.select('tr')[1:]:
    for j in i.select('td'):
        row.append(j.text)
    string_stats_list.append(row)
    row = []

df = pd.DataFrame(data=string_stats_list, columns=headers_list)