我正在尝试将文本放在href标记内。基本上,我试图在https://code.google.com/p/android/issues/list
中删除安卓漏洞<td class="vt col_4" width="100%" onclick="if (!cancelBubble) _goIssue(0)">
<a onclick="cancelBubble=true" href="../../android/issues/detail id=58866&colspec=ID Type Status Owner Summary Stars">
compass not showing right direktion
</a>
</td>
这是我的代码:
class MySpider(BaseSpider):
name = "craig"
start_urls = ["https://code.google.com/p/android/issues/list"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//td[@class='vt col_4']")
items = []
for titles in titles:
item = CraiglistSampleItem()
item ["id"] = titles.select("a/text()").extract()
item ["type"] = titles.select("a/@href").extract()
items.append(item)
return items
我在其他href上测试过,它运行正常。有谁知道为什么这不会对显示上面的bug摘要的href起作用。谢谢!
答案 0 :(得分:1)
您的迭代变量与您正在迭代的变量具有相同的名称,这不是一个好主意。此外,您必须选择每隔一行:
class MySpider(BaseSpider):
name = "craig"
start_urls = ["https://code.google.com/p/android/issues/list"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
table = hxs.select("//table[@id='resultstable']")
for title in table.select("tr/td[@class='vt col_4'][2]"):
item = CraiglistSampleItem()
item["id"] = title.select("a/text()").extract()
item["type"] = title.select("a/@href").extract()
yield item