我有一个内容为
的foo.txt文件'w3ll' 'i' '4m' 'n0t' '4sed' 't0'
'it'
我试图提取其中包含2个字符的所有单词。我的意思是,输出文件应该只有
4m
t0
it
我尝试的是,
with open("foo.txt" , 'r') as foo:
listme = foo.read()
string = listme.strip().split("'")
我想这将用'符号分割字符串。 如何只选择那些字符数等于2的撇号中的字符串?
答案 0 :(得分:1)
这应该有效:
>>> with open('abc') as f, open('output.txt', 'w') as f2:
... for line in f:
... for word in line.split(): #split the line at whitespaces
... word = word.strip("'") # strip out `'` from each word
... if len(word) == 2: #if len(word) is 2 then write it to file
... f2.write(word + '\n')
print open('output.txt').read()
4m
t0
it
使用regex
:
>>> import re
>>> with open('abc') as f, open('output.txt', 'w') as f2:
for line in f:
words = re.findall(r"'(.{2})'",line)
for word in words:
f2.write(word + '\n')
...
>>> print open('output.txt').read()
4m
t0
it
答案 1 :(得分:1)
鉴于您要查找''
符号中包含的所有单词,这两个字符的长度恰好是两个字符:
import re
split = re.compile(r"'\w{2}'")
with open("file2","w") as fw:
for word in split.findall(open("file","r").read()):
fw.write(word.strip("'")+"\n")
答案 2 :(得分:0)
with open("foo.txt" , 'r') as file:
words = [word.strip("'") for line in file for word in line.split() if len(word) == 4]
with open("out", "w") as out:
out.write('\n'.join(words) + '\n')
答案 3 :(得分:0)
由于您正在阅读以空格(或逗号)分隔的引用单词,因此您可以使用csv模块:
import csv
with open('/tmp/2let.txt','r') as fin, open('/tmp/out.txt','w') as fout:
reader=csv.reader(fin,delimiter=' ',quotechar="'")
source=(e for line in reader for e in line)
for word in source:
if len(word)<=2:
print(word)
fout.write(word+'\n')
'out.txt':
i
4m
t0