我有一个这种格式的文本文件:
abc? cdfde" nhj.cde' dfwe-df$sde.....
如何忽略所有特殊字符,空格,数字,行尾等,并只在另一个文件中写入字符?例如,上面的文件变为
abccdfdenhjcdedfwedfsde.....
从此输出文件中
首先,我如何只读取字符并写入外部文件? 我可以通过使用f.read(1)直到文件末尾来逐个字符地读取。我如何应用它来一次读取2,3个字符,也只跳过一个字符(也就是说,如果我有abcd,我应该阅读ab,bc,cd但不是ab,cd(我想这可以通过f.read(2)完成))。谢谢。我这样做是为了密码分析工作来按频率分析密文。
答案 0 :(得分:2)
如果你需要向前看(一次读取一些额外的字符),你需要一个缓冲的文件对象。以下课程就是这样:
import io
class AlphaPeekReader(io.BufferedReader):
def readalpha(self, count):
"Read one character, and peek ahead (count - 1) *extra* characters"
val = [self.read1(1)]
# Find first alpha character
while not val[0].isalpha():
if val == ['']:
return '' # EOF
val = [self.read1(1)]
require = count - len(val)
peek = self.peek(require * 3) # Account for a lot of garbage
if peek == '': # EOF
return val[0]
for c in peek:
if c.isalpha():
require -= 1
val.append(c)
if not require:
break
# There is a chance here that there were not 'require' alpha chars in peek
# Return anyway.
return ''.join(val)
此尝试查找您正在阅读的字符之外的额外字符,但不保证它能够满足您的要求。如果我们在文件的末尾或者下一个块中有很多非字母文本,它可以读取更少。
用法:
with AlphaPeekReader(io.open(filename, 'rb')) as alphafile:
alphafile.readalpha(3)
演示,使用带有示例输入的文件:
>>> f = io.open('/tmp/test.txt', 'rb')
>>> alphafile = AlphaPeekReader(f)
>>> alphafile.readalpha(3)
'abc'
>>> alphafile.readalpha(3)
'bcc'
>>> alphafile.readalpha(3)
'ccd'
>>> alphafile.readalpha(10)
'cdfdenhjcd'
>>> alphafile.readalpha(10)
'dfdenhjcde'
要在循环中使用readalpha()
调用,分别获取每个字符加上接下来的两个字节,请使用带有标记的iter()
:
for alpha_with_extra in iter(lambda: alphafile.readalpha(3), ''):
# Do something with alpha_with_extra
答案 1 :(得分:0)
从文件中读取一行:
import fileinput
text_file = open("Output.txt", "w")
for line in fileinput.input("sample.txt"):
outstring = ''.join(ch for ch in line if ch.isalpha())
text_file.write("%s"%outstring)
text_file.close()