我有一个文本文件,我在我的python代码中打开。我想搜索文件并提取特定符号后面的文本。例如,我的文本文件名File.txt是:
您好,这只是一个虚拟文件,其中包含完全没有实质内容的信息,我想在美元符号之间提取信息。所以这里之间的所有这些都应该被拉出来,所以我可以用它来做任何我想做的事情,剩下的这将是第二组。
以下是我的代码示例:
class FileExtract(object):
__init__(self):
pass
def extractFile(self):
file = open(File.txt)
wholeFile = file.read()
file.close()
symCount = wholefile.count("$")
count = 0 #Will count the each $ as it finds it
begin = False #determines which the $ has been found and begin to start copying word
myWant = [] #will add the portion I want
for word in wholeFile.split():
while(count != symCount):
if word != "$" and begin == False:
break
if word == "$" and begin == False:
myWant.append(word)
begin = True
count = count + 1 #it found one of the total symbols
break
elif word != "$" and begin == True:
myWant.append(word)
break
elif word == "$" and begin == True:
begin = False
break
print myWant
我想要打印:
"$ in between here should be pulled out so I can do what ever I want to with it"
"$ and the rest of this will be a second group."
这是我能想出来的唯一方法(我知道这很糟糕,请放轻松我只是学习)。问题是我的方式是把它放到一个列表中,我希望它只用空格,换行符和所有字符串打印出来。我忽略的任何建议或其他功能/方法构建对我有帮助吗?
答案 0 :(得分:1)
s = "Hello, this is just a dummy file that has information with no substance at all and I want to pull the information between the dollar sign symbols. So all of this $ in between here should be pulled out so I can do what ever I want to with it $ and the rest of this will be a second group."
a = s.split("$")[1:]
print a
当然,分隔符不会出现在结果中,但是自己附加分配是微不足道的。
答案 1 :(得分:0)
好吧,你可以做wholefile.split('$')
然后有3个元素列表:第一个$之前是什么,$'之间的东西,以及第二个$之后的东西。 (而且没有$。)
甚至是print '\n$'.join(wholefile.split('$'))
作为最小功能:
def extract_file(filename):
return '\n$'.join(open(filename).read().split('$'))
答案 2 :(得分:0)
这是flex的用途。你不需要使用flex在python中做同样的事情。
firstFlag = False
secondFlag = False
outputFile1 = open('first.txt', 'wb')
outputFile2 = open('second.txt', 'wb')
yourFile = open('thefile.txt', 'rb')
while True:
char = yourFile.read(1)
if not char:
break
if char == '$'
if firstFlag:
secondFlag = True
firstFlag = True
if firstFlag and not secondFlag:
outputFile1.write(data)
elif secondFlag:
outputFile2.write(data)
由于这不是本机C代码,因此速度不会很快。我建议你不仅要看看灵活的工具,还要考虑学习体验。
flex中的上述代码:
%option 8bit outfile="scanner.c"
%option nounput nomain noyywrap
%option warn
%x First
%x Second
%%
. { ECHO; }
\$ { BEGIN First; yyout = fopen("first.txt", "wb"); }
<First>\$ { BEGIN Second; fclose(yyout); yyout = fopen("second.txt", "wb");}
<First>. { ECHO; }
<Second>. { ECHO; }
%%
点命令将每个字符发送到yyout,开始指向stdout。编译这个:
flex -Cf scanner.l
gcc -O -o flexer.exe scanner.c
它将从stdin中输入。
答案 3 :(得分:0)
实际上非常简单。不使用拆分也不将结果存储在列表中:
def extractFile(self):
file = open(File.txt)
wholeFile = file.read()
file.close()
pos = wholeFile.find("$")
while pos > 0:
pos2 = wholeFile.find("$")
if pos2 > 0:
print wholeFile[pos:pos2]
else:
print wholeFile[pos:]
pos = pos2