我正在使用此脚本删除重复的行和大于10000的值。
import fileinput
import os, sys
import re
def get_immediate_subdirectories(a_dir):
return [name for name in os.listdir(a_dir) if os.path.isdir(os.path.join(a_dir,name))]
for i in get_immediate_subdirectories(os.getcwd()+'\population_vs_time\\'):
for j in get_immediate_subdirectories(os.getcwd()+'\population_vs_time\\'+i):
for file in os.listdir(os.getcwd()+'\population_vs_time\\'+i+'\\'+j):
seen=set()
my_dir=os.getcwd()+'\population_vs_time\\'+str(i)+'\\'+str(j)+'\\'+file
matches=re.match("time",str(file))
if not matches:
print (my_dir)
f = fileinput.input(files=my_dir)
for line in f:
if line in seen: continue # skip duplicate
flag=0
words = line.split()
for word in words:
# try:
i=float(word)
if i>10000:
flag=1
break
# except ValueError:
# flag=1
if flag==1: continue
seen.add(line)
print line, # standard output is now redirected to the file
f.close()
我有一个字符串类型的my_dir变量,使用print函数显示该变量的值,为前三个文件提供正确的输出
C:\Program Files (x86)\Guimoo\bin\population_vs_time\mocmaes\frontsize_gen=220\mocmaes_gen_220_100_.csv
C:\Program Files (x86)\Guimoo\bin\population_vs_time\mocmaes\frontsize_gen=220\mocmaes_gen_220_120_.csv
C:\Program Files (x86)\Guimoo\bin\population_vs_time\mocmaes\frontsize_gen=220\mocmaes_gen_220_140_.csv
但是当读取下一个文件时,它会给出随机数作为输出(1.70512而不是mocmaes)
C:\Program Files (x86)\Guimoo\bin\population_vs_time\1.70512\frontsize_gen=220\mocmaes_gen_220_160_.csv
我想我错过了python \ escape字符的基础知识。是这样吗?