我需要帮助阅读这些文本文件,不知何故,当我做一个递归循环时,另一个循环总是重置为第一行。
import sys
import codecs # there are actually more utf-8 char to the lines, so i needed codecs
reload(sys)
sys.setdefaultencoding('utf-8')
reader = codecs.open("txtfile1", 'r', 'utf-8')
reader2 = codecs.open("txtfile2", 'r', 'utf-8')
for row in reader:
print row[0:11] # here the outer loops is running the cycles
for row2 in reader2:
print row[0:11] # here the outer loops gets resets
if row[0:11]==row2[0:11]:
print row[12:] + row2[12:]
文本文件如下所示:
txtfile1
95032302317 foo
95032302318 bar
95032302319 bron
95032302320 cow
95032302321 how
95032302322 now
95032303001 lala
95032303002 lili
txtfile2
95032103318 bar (in another utf8 language)
95032103319 bron (in another utf8 language)
95032103320 cow (in another utf8 language)
95032103321 how (in another utf8 language)
95032103322 now (in another utf8 language)
95032103323 didi
95032103324 dada
95032103325 kaka
答案 0 :(得分:1)
无法告诉您原因,只需将for row in reader:
替换为for row in reader.readlines():
即可解决此问题。如果一次都无法导入所有内容,那么您可能需要手动处理迭代。
我刚刚意识到我做了一些稍微不同的工作:
outer = codecs.open(<outer loop file).readlines()
inner = codecs.open(<inner loop file).readlines()
for o in outer:
for i in inner:
print o
答案 1 :(得分:1)
我只是这样做:
row2 = reader2.readlines()
for row in reader.readlines():
print row
if row in row2:
print 'yeah'
编辑: 新解决方案:
row2 = [line[:11] for line in reader2.readlines()]
for row in reader.readlines():
print row
if row[:11] in row2:
print 'yeah'
答案 2 :(得分:1)
此代码将文件嗅到内存,但如果您的文件小于几百兆,则可能适用于您。
#!/usr/bin/python2 -S
# -*- coding: utf-8 -*-
# vim:ts=4:sw=4:softtabstop=4:smarttab:expandtab
import sys
sys.setdefaultencoding("utf-8")
import site
import codecs
t1 = {}
t2 = {}
with codecs.open("txtfile1", 'r', 'utf-8') as reader:
for row in reader:
number, text = row.split(" ", 1)
t1[number] = text
with codecs.open("txtfile2", 'r', 'utf-8') as reader:
for row in reader:
number, text = row.split(" ", 1)
t2[number] = text
common = set(t1.keys()) & set(t2.keys())
while common:
key = common.pop()
print t1[key], t2[key]