我正在使用BS4来抓取文字。我目前的文本输出有7个不同的字段,我想将其放入7个不同的列表中。我的代码如下:
from bs4 import BeautifulSoup
import requests
urlYears = ['2012']
for year in urlYears:
soup = BeautifulSoup(requests.get("https://en.wikipedia.org/wiki/" + "2012" + "_NFL_Draft").content,"html.parser")
table = soup.select_one("table.wikitable.sortable")
for row in table.select("tr + tr"):
tds=row.text
print (tds)
打印输出将显示如下:
7^
252
St. Louis Rams
Richardson, DarylDaryl Richardson
RB
Abilene Christian
Lone Star
7^
253
Indianapolis Colts
Harnish, ChandlerChandler Harnish
QB
NIU
MAC
如何从这些列表中创建列表?最终目标是以CSV格式导出。
答案 0 :(得分:0)
一个简单的方法是在换行符上只显示split()
文本?
import os
from bs4 import BeautifulSoup
import requests
soup = BeautifulSoup(requests.get("https://en.wikipedia.org/wiki/2012_NFL_Draft").content, "html.parser")
table = soup.select_one("table.wikitable.sortable")
for row in table.select("tr + tr"):
tds=row.text.split(os.linesep)
print tds
产量
[u'', u'', u'1', u'1', u'Indianapolis Colts', u'Luck, AndrewAndrew Luck\xa0\u2020', u'QB', u'Stanford', u'Pac-12', u'', u'']
[u'', u'', u'1', u'2', u'Washington Redskins', u'Griffin III, RobertRobert Griffin III\xa0\u2020', u'QB', u'Baylor', u'Big 12', u'from St. Louis\xa0[R1 - 1];', u'2011 Heisman Trophy winner\xa0[N 2]', u'']
[u'', u'', u'1', u'3', u'Cleveland Browns', u'Richardson, TrentTrent Richardson\xa0', u'RB', u'Alabama', u'SEC', u'from Minnesota\xa0[R1 - 2]', u'']
[u'', u'', u'1', u'4', u'Minnesota Vikings', u'Kalil, MattMatt Kalil\xa0\u2020', u'OT', u'USC', u'Pac-12', u'from Cleveland\xa0[R1 - 3]', u'']
[u'', u'', u'1', u'5', u'Jacksonville Jaguars', u'Blackmon, JustinJustin Blackmon\xa0', u'WR', u'Oklahoma State', u'Big 12', u'from Tampa Bay\xa0[R1 - 4]', u'']
...
H个 DTK
编辑:您实际上只需要.splitlines()
让Python正确处理换行符。同时保存os
导入。