我在python 3中工作。我的目标是提取一个表的不同值并将它们放在不同的列表中。
问题在于我无法承担" img alt"的价值。在td。
这是我的代码:
from bs4 import BeautifulSoup
import urllib.request
redditFile = urllib.request.urlopen("http://www.mtggoldfish.com/movers/online/all")
redditHtml = redditFile.read()
redditFile.close()
soup = BeautifulSoup(redditHtml)
all_tables = soup.find_all('table')
right_table = soup.find('table', class_='table table-bordered table-striped table-condensed movers-table')
#create a list
A=[]
B=[]
C=[]
D=[]
for row in right_table.findAll("tr"):
cells = row.findAll('td')
increment = row.findAll('span')
colection = row.findAll('img')
link = row.findAll('a')
if len(cells) == 6:
A.append(cells[0].find(text=True))
B.append(increment[0].find(text=True))
C.append(colection[0])
D.append(link[0].find(text=True))
print(A)
print(B)
print(C)
print(D)
这段代码给了我这个结果:
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
['+8.40', '+2.47', '+1.35', '+1.28', '+1.14', '+0.99', '+0.94', '+0.91', '+0.90', '+0.75']
[<img alt="ORI" class="sprite-set_symbols_ORI" src="//assets1.mtggoldfish.com/assets/s-407aaa9c9786d606684c6967c47739c5.gif"/>, <img alt="PRM" class="sprite-set_symbols_PRM" src="//assets1.mtggoldfish.com/assets/s-407aaa9c9786d606684c6967c47739c5.gif"/>, <img alt="8ED" class="sprite-set_symbols_8ED" src="//assets1.mtggoldfish.com/assets/s-407aaa9c9786d606684c6967c47739c5.gif"/>, <img alt="EX" class="sprite-set_symbols_EX" src="//assets1.mtggoldfish.com/assets/s-407aaa9c9786d606684c6967c47739c5.gif"/>, <img alt="TSB" class="sprite-set_symbols_TSB" src="//assets1.mtggoldfish.com/assets/s-407aaa9c9786d606684c6967c47739c5.gif"/>, <img alt="WL" class="sprite-set_symbols_WL"
src =&#34; // assets1.mtggoldfish.com/assets/s-407aaa9c9786d606684c6967c47739c5.gif" /> ;,,,,] [&#34; Jace,Vryn的Prodigy&#34;,&#34; Gaea's Cradle&#34;,&#39; Ensnaring Bridge&#39;&#39; City of Traitors&#39; ;,&#39; Pendelhaven&#39;&#39; Firestorm&#39;,&#39; Kor Spiritdancer&#39;,&#39; Scalding Tarn&#39; Daybreak Coronet&#39;, &#39; Burnwillows的树丛&#39;]
但我需要IMG ALT VALUE(例如,第一个img alt值为&#34; ORI&#34;)
收集变量
我不知道自己可以做什么。伙计们,你能帮我解决这个问题吗?
非常感谢
答案 0 :(得分:3)
获得<img>
节点实例后,您可以使用以下方法获取alt值:
alt_tag = img.attrs['alt']
由于您正在获取img元素的集合,因此您可以对其进行迭代并检索每个元素的alt标记:
tags = []
collection = soup.findAll("img")
for img in collection:
if 'alt' in img.attrs:
tags.append(img.attrs['alt'])
#do whatever you need to do with your list of alt attributes.
print tags
答案 1 :(得分:1)
如果你只想要img标签中的alt,你只需要从表中选择img标签并提取alt属性:
right_table = soup.find('table', class_='table table-bordered table-striped table-condensed movers-table')
print([img["alt"] for img in right_table.select("img[alt]")])
['ORI', 'PRM', '8ED', 'EX', 'TSB', 'WL', 'ROE', 'ZEN', 'FUT', 'FUT']
在你自己的循环中,你似乎只想要一个元素使用findAll,如果你只想要第一个,那么使用find row.find('span')
等等。row.find('img')["alt"]
会给你alt值每一行,查看页面每个只有一个,所以你绝对不需要findAll。
如果你想在本地重新创建表,我会把数据放在一个字典中:
right_table = soup.find('table', class_='table table-bordered table-striped table-condensed movers-table')
table_dict = {}
for row in right_table.select("tr"):
# increase class are where increments are
increments = [s.text for s in row.select('span.increase')]
# make sure we have some data in tr
if increments:
# rank/place is first text in td, could also use find("td",{"class":"first-right"})
place = int(row.td.text)
# text/character name is in a tag text
title = row.find("a").text
increments.append(title)
# get alt attribute from img tag
increments.append(row.find("img")["alt"])
table_dict[place] = increments
from pprint import pprint as pp
pp(table_dict)
输出:
{1: [u'+8.78', u'68.03', u'+15.00%', u"Jace, Vryn's Prodigy", 'ORI'],
2: [u'+2.47', u'47.96', u'+5.00%', u"Gaea's Cradle", 'PRM'],
3: [u'+1.95', u'20.37', u'+11.00%', u'Firestorm', 'WL'],
4: [u'+1.73', u'23.91', u'+8.00%', u'Force of Will', 'VMA'],
5: [u'+1.35', u'40.88', u'+3.00%', u'Ensnaring Bridge', '8ED'],
6: [u'+1.28', u'44.02', u'+3.00%', u'City of Traitors', 'EX'],
7: [u'+1.15', u'41.98', u'+3.00%', u'Time Walk', 'VMA'],
8: [u'+1.01', u'28.68', u'+4.00%', u'Daze', 'NE'],
9: [u'+1.01', u'19.96', u'+5.00%', u"Goryo's Vengeance", 'BOK'],
10: [u'+1.00', u'3.99', u'+33.00%', u'Unearth', 'UL']}
您将看到的内容与当前表格数据完全匹配,如果您希望所有获奖者只需将网址更改为http://www.mtggoldfish.com/movers-details/online/all/winners/dod
或者,如果你想打破田地,只需拉动第一个增量:
for row in right_table.select("tr"):
increment = row.find('span',{"class":'increase'})
if increment:
increment = increment.text
place = int(row.td.text)
title = row.select("a[data-full-image]")[0].text
alt = (row.find("img")["alt"])
table_dict[place] = {"title":title,"alt":alt, "inc":increment}
from pprint import pprint as pp
pp(table_dict)
输出:
{1: {'alt': 'ORI', 'inc': u'+8.78', 'title': u"Jace, Vryn's Prodigy"},
2: {'alt': 'PRM', 'inc': u'+2.47', 'title': u"Gaea's Cradle"},
3: {'alt': 'WL', 'inc': u'+1.95', 'title': u'Firestorm'},
4: {'alt': 'VMA', 'inc': u'+1.73', 'title': u'Force of Will'},
5: {'alt': '8ED', 'inc': u'+1.35', 'title': u'Ensnaring Bridge'},
6: {'alt': 'EX', 'inc': u'+1.28', 'title': u'City of Traitors'},
7: {'alt': 'VMA', 'inc': u'+1.15', 'title': u'Time Walk'},
8: {'alt': 'NE', 'inc': u'+1.01', 'title': u'Daze'},
9: {'alt': 'BOK', 'inc': u'+1.01', 'title': u"Goryo's Vengeance"},
10: {'alt': 'UL', 'inc': u'+1.00', 'title': u'Unearth'}}