如果一行包含来自mobilesitemap-browse.csv的字符串,我试图在相邻列的sitemap_bp.csv中追加一行。 我无法遍历mobilesitemap-browse.csv中的行,它会卡在第一行。我该如何解决这个问题?
import csv
with open('sitemap_bp.csv','r') as csvinput:
with open('mobilesitemap-browse.csv','r') as csvinput2:
with open('output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
sitemap = csv.reader(csvinput)
mobilesitemap = csv.reader(csvinput2)
all = []
row = next(sitemap)
row.append('mobile')
all.append(row)
for mobilerow in mobilesitemap:
for row in sitemap:
#print row[0]
if mobilerow[1] in row[0]:
#print row, mobilerow[1]
all.append((row[0], mobilerow[1]))
else:
all.append(row)
writer.writerows(all)
答案 0 :(得分:1)
我个人首先从sitemap_bp.csv解析数据,然后使用该字典填充新文件。
import re
with open('sitemap_bp.csv','r') as csvinput, \
open('mobilesitemap-browse.csv','r') as csvinput2, \
open('output.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
sitemap = csvinput # no reason to pipe this through csv.reader
mobilesitemap = csv.reader(csvinput2)
item_number = re.compile(r"\d{5}_\d{7}_{7}")
item_number_mapping = {item_number.search(line).group(): line.strip()
for line in sitemap if item_number.search(line)}
# makes a dictionary {item_number: full_url, ...} for each item in sitemap
# alternate to the above, consider:
# # item_number_mapping = {}
# # for line in sitemap:
# # line = line.strip()
# # match = item_number.search(line)
# # if match:
# # item_number_mapping[match.group()] = match.string
all = [row + [item_number_mapping[row[1]] for row in mobilesitemap]
writer.writerows(all)
我的猜测是,在第一次通过外部for
循环之后,它会再次尝试迭代sitemap
,但由于文件已经用完,所以无法再次。最小的变化是:
for mobilerow in mobilesitemap:
csvinput.seek(0) # seek to the start of the file object
next(sitemap) # skip the header row
for row in sitemap:
#print row[0]
if mobilerow[1] in row[0]:
#print row, mobilerow[1]
all.append((row[0], mobilerow[1]))
else:
all.append(row)
但是,不这样做的显而易见的原因是它在sitemap_bp.csv
中每行迭代一次mobilesitemap-browse.csv
文件,而不是像我的代码一样。
如果您需要在sitemap_bp.csv
中获取与mobilesitemap-browse.csv
不符的网址列表,则可能会为所有项目set
提供最佳服务你看到它们,然后使用set操作来获取看不见的项目。这需要一点点修补,但是......
# instead of all = [row + [item number ...
seen = set()
all = []
for row in mobilesitemap:
item_no = row[1]
if item_no in item_number_mapping:
all.append(row + [item_number_mapping[item_no]])
seen.add(item_no)
# after this for loop, `all` is identical to the list comp version
unmatched_items = [item_number_mapping[item_num] for item_num in
set(item_number_mapping.keys()) - seen]