脚本从csv文件获取链接并从网页中删除一些信息。某些链接不起作用,脚本会丢失。我已经包含了一个try / except,但是这会弄乱我的输出,因为我需要与原始文件中一样的输出行的确切数量。
for row in reader:
try:
url = row[4]
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
except:
continue
有没有办法从csv文件中删除有错误链接的行? 类似的东西:
for row in reader:
try:
url = row[4]
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
except:
continue
DELETE_THE_ROW
答案 0 :(得分:1)
最好的方法是创建一个新的csv文件,并继续只写那些链接有效的行。
f = open('another_csv.csv','w+')
for row in reader:
try:
url = row[4]
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
print >>f,','.join(row)
except:
#can log the faulty links in another file
continue
f.close()
您可以将新csv重命名为原始csv,或同时保留两者。
答案 1 :(得分:0)
如果一切顺利,为什么不把好行写到另一个文件?
writer = csv.writer(out_file_handle)
for row in reader:
try:
url = row[4]
req=urllib2.Request(url)
tree = lxml.html.fromstring(urllib2.urlopen(req).read())
except:
continue
else:
writer.writerow(row)