break
,我将无法打印出文件.. 请指导我..我是蟒蛇初学者......
import os
path = "D:\test"
in_files = os.listdir(path)
desc = open("desc.txt", "w")
print >> desc, "Mol_ID, Text1, Text2"
moldesc = ['Text1', 'Text2']
for f in in_files:
file = os.path.join(path, f)
text = open(file, "r")
hit_count = 0
hit_count1 = 0
for line in text:
if moldesc[0] in line:
Text1 = line.split()[-1]
if moldesc[1] in line:
Text2 = line.split()[-1]
print >> desc, f + "," + Text1 + "," + Text2
text.close()
print "Text extraction done !!!"
答案 0 :(得分:2)
您的代码存在一些问题:
text.close()
应与for line in text
循环处于同一级别。print >> desc
语句不合适:只有在定义Text1
和Text2
时才应打印。您可以在for line in text
循环之外将它们设置为None,并测试它们是否都不是None
。 (或者,您可以在hit_count0=1
测试中设置if moldesc[0]
,在hit_count1=1
中设置if moldesc[1]
并测试hit_count0 and hit_count1
)。在这种情况下,打印输出并使用break
来逃避循环。(所以,在简单的代码中:)
for f in in_files:
file = os.path.join(path, f)
with open(file, "r") as text:
hit_count = 0
hit_count1 = 0
for line in text:
if moldesc[0] in line:
Text1 = line.split()[-1]
hit_count = 1
if moldesc[1] in line:
Text2 = line.split()[-1]
hit_count1 = 1
if hit_count and hit_count1:
print >> desc, f + "," + Text1 + "," + Text2
break
还有第三个问题:
你提到想要之前的文字 Text1
?然后,您可能希望使用Text1 = line[:line.index(moldesc[0])]
而不是Text1 = line.split()[-1]
...
答案 1 :(得分:0)
我会选择mmap
并可能使用CSV作为结果文件方法,类似于(未经测试)和粗糙的边缘......(需要更好的错误处理,可能想要使用mm.find( )而不是一个正则表达式,一些代码是从OP等逐字复制的...,而我的电脑的电池即将死......)
import os
import csv
import mmap
from collections import defaultdict
PATH = r"D:\test" # note 'r' prefix to escape '\t' interpretation
in_files = os.listdir(path)
fout = open('desc.txt', 'w')
csvout = csv.writer(fout)
csvout.writerow( ['Mol_ID', 'Text1', 'Text2'] )
dd = defaultdict(list)
for filename in in_files:
fin = open(os.path.join(path, f))
mm = mmap.mmap(fin.fileno(), 0, access=mmap.ACCESS_READ)
# Find stuff
matches = re.findall(r'(.*?)(Text[12])', mm) # maybe user finditer depending on exact needs
for text, matched in matches:
dd[matched].append(text)
# do something with dd - write output using csvout.writerow()...
mm.close()
fin.close()
csvout.close()