我正在阅读tsv,运行Web服务以检索某些信息,然后将其吐入csv。
我已经运行了4,610行代码并且运行完美,但它似乎在4,611行失败了。我不想从头开始再次运行我的代码,所以我怎么能
我的代码如下:
import csv
import GetAlexRanking #External Method exposed here
import subprocess
import pandas as p
import tai
import numpy as np
loadData = lambda f: np.genfromtxt(open(f,'r'), delimiter=' ')
with open('train.tsv','rb') as tsvin, open('PageRanks.csv', 'wb') as csvout:
tsvin = list(np.array(p.read_table('train.tsv'))[:,0])
csvout = csv.writer(csvout)
csvout.writerow(["URL","AlexaRank","GoogleRank"]) #writing
for row in tsvin: #start in row 4,611
count = 0
sep = '|'
row = row.split(sep, 1)[0]
cmd = subprocess.Popen("python GetAlexRanking.py " + row ,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True)
(output, err) = cmd.communicate()
exit_code = cmd.wait()
outlist = output.split('\r\n')
try:
outrank1 = outlist[1][outlist[1].index(':')+1:]
except ValueError:
outrank1 = "?"
try:
outrank2 = outlist[2][outlist[2].index(':')+1:]
except ValueError:
outrank2 = "?"
csvout.writerow([str(outlist[0]), str(outrank1), str(outrank2)]) #is there a way to append here rather than write anew?
count+=1
非常感谢任何帮助。
谢谢!
答案 0 :(得分:2)
对您已处理的行不执行任何操作:
i = 0
for row in tsvin:
if i < 4611:
continue
... the rest of your code
甚至更好,正如@Joran所建议的那样:
for i,row in enumerate(tsvin):
使用追加说明打开文件:
open('PageRanks.csv', 'a')
答案 1 :(得分:0)
from itertools import islice
START_AT = 4611
for i, row in enumerate(islice(tsvin, START_AT, None), START_AT):
# ... your code here