我正试图通过抓取来SELECT stories.id,stories.content,COUNT(stories.id) as totalcomment
FROM stories
JOIN comments
ON stories.id=comments.story_id GROUP BY stories.id
抓取所有位置名称的列表,我曾经使用过以下内容:
BeautifulSoup
过去常用于HTML
locs = LOOPED.findAll("td", {"class": "max use"})
然而,HTML已更改为并且不再返回 <td class="max use" style="">London</td>
London
编辑:如果我打印了loc,我会得到一个列表:
<td class="max use" style="">
<div class="notranslate">
<span><a data-title="View Location" href="/location/uk/gb/london/">London</a></span> <span class="extra hidden">(DEFAULT)</span>
</div>
</td>
您可以看到其中有3个不同的位置,从上面我希望看到<td class="max use" style="">\n<div class="notranslate">\n<span><a data-title="View Location" href="/location/uk/gb/london/">London</a></span> <span class="extra hidden">(DEFAULT)</span>\n</div>\n</td>, <td class="max use" style="">\n<div class="notranslate">\n<span><a data-title="View Location" href="/location/uk/gb/manchester/">Manchester</a></span> <span class="extra hidden">(DEFAULT)</span>\n</div>\n</td>, <td class="max use" style="">\n<div class="notranslate">\n<span><a data-title="View Location" href="/location/uk/gb/liverpool/">Liverpool</a></span> <span class="extra hidden">(NA)</span>\n</div>\n</td>]
的列表
我认为我应该使用类似的东西:
[London, Manchester, Liverpool]
但这只会随着
回归AttributeError:'ResultSet'对象没有属性'findAll'
我无法弄清楚如何让locs = LOOPED.findAll("td", {"class": "max use"})
locs = locs.findAll('a')[1]
print locs.text
重新搜索超链接文本......
答案 0 :(得分:2)
试试这个:
tag = LOOPED.findAll('td') #all "td" tag in a list
tag_a = tag[0].find('a')
print tag_a.text
答案 1 :(得分:1)
对未来HTML结构更改更加健壮的方法是获取每个td
元素中的所有文本,如this answer中所述:
locs = LOOPED.findAll("td", {"class": "max use"})
for loc in locs:
print ''.join(loc.findAll(text=True))