我正在编写一个脚本,它将导入一个url列表,然后检查源代码中的一些内容。我需要有关导入.csv和处理它的帮助,如果有人可以帮助这里是代码的一部分
from lxml import html
import csv
def main():
with open('urls.csv', 'r') as csvfile:
urls = [row[0] for row in csv.reader(csvfile)]
for url in urls:
doc = html.parse(url)
linkziel = 'http://dandydiary.de/de'
if doc.xpath('//a[@href=$url]', url=linkziel):
for anchor_node in doc.xpath('//a[@href=$url]', url=linkziel):
if anchor_node.xpath('./ancestor::div[contains(@class, "sidebar")]'):
print 'Sidebar'
elif anchor_node.xpath('./parent::div[contains(@class, "widget")]'):
print 'Sidebar'
elif anchor_node.xpath('./ancestor::div[contains(@class, "comment")]'):
print 'Kommentar'
elif anchor_node.xpath('./ancestor::div[contains(@id, "comment")]'):
print 'Kommentar'
elif anchor_node.xpath('./ancestor::div[contains(@class, "foot")]'):
print "Footer"
elif anchor_node.xpath('./ancestor::div[contains(@id, "foot")]'):
print "Footer"
elif anchor_node.xpath('./ancestor::div[contains(@class, "post")]'):
print "Contextual"
else:
print 'Unidentified Link'
else:
print 'Link is Dead'
if __name__ == '__main__':
main()
我想使用一个将会运行的csv(我正在使用Python 2)而不是只指定一个url
答案 0 :(得分:0)
Python提供了一个csv
模块,您可以使用该模块导入列表。
答案 1 :(得分:0)
假设您有一个input.csv
文件,每个新行都有一个网址:
http://de.wikipedia.org
http://spiegel.de
http://www.vickysmodeblog.com/
然后,您可以通过csv模块将其读入列表并迭代它:
import csv
from lxml import html
with open('input.csv', 'r') as csvfile:
urls = [row[0] for row in csv.reader(csvfile)]
for url in urls:
print url
doc = html.parse(url)
linkziel = 'http://dandydiary.de/de'
if doc.xpath('//a[@href=$url]', url=linkziel):
for anchor_node in doc.xpath('//a[@href=$url]', url=linkziel):
if anchor_node.xpath('./ancestor::div[contains(@class, "sidebar")]'):
print 'Sidebar'
elif anchor_node.xpath('./parent::div[contains(@class, "widget")]'):
print 'Sidebar'
elif anchor_node.xpath('./ancestor::div[contains(@class, "comment")]'):
print 'Kommentar'
elif anchor_node.xpath('./ancestor::div[contains(@id, "comment")]'):
print 'Kommentar'
elif anchor_node.xpath('./ancestor::div[contains(@class, "foot")]'):
print "Footer"
elif anchor_node.xpath('./ancestor::div[contains(@id, "foot")]'):
print "Footer"
elif anchor_node.xpath('./ancestor::div[contains(@class, "post")]'):
print "Contextual"
else:
print 'Unidentified Link'
else:
print 'Link is Dead'
它的输出是:
http://de.wikipedia.org
Link is Dead
http://spiegel.de
Link is Dead
http://www.vickysmodeblog.com/
Contextual