从文件中过滤特定长度的字符串

时间:2013-07-01 13:45:38

标签: python string file python-2.7 extract

我有一个内容为

的foo.txt文件
'w3ll' 'i' '4m' 'n0t' '4sed' 't0' 

'it'

我试图提取其中包含2个字符的所有单词。我的意思是,输出文件应该只有

4m
t0
it

我尝试的是,

with open("foo.txt" , 'r') as foo:
    listme = foo.read()

string =  listme.strip().split("'")

我想这将用'符号分割字符串。 如何只选择那些字符数等于2的撇号中的字符串?

4 个答案:

答案 0 :(得分:1)

这应该有效:

>>> with open('abc') as f, open('output.txt', 'w') as f2:
...     for line in f:
...         for word in line.split():    #split the line at whitespaces
...             word = word.strip("'")   # strip out `'` from each word
...             if len(word) == 2:       #if len(word) is 2 then write it to file
...                 f2.write(word + '\n')

print open('output.txt').read()
4m
t0
it

使用regex

>>> import re
>>> with open('abc') as f, open('output.txt', 'w') as f2:
    for line in f:
        words = re.findall(r"'(.{2})'",line)
        for word in words:
            f2.write(word + '\n')
...             
>>> print open('output.txt').read()
4m
t0
it

答案 1 :(得分:1)

鉴于您要查找''符号中包含的所有单词,这两个字符的长度恰好是两个字符:

import re
split = re.compile(r"'\w{2}'")

with open("file2","w") as fw:
    for word in split.findall(open("file","r").read()):
            fw.write(word.strip("'")+"\n")

答案 2 :(得分:0)

with open("foo.txt" , 'r') as file:
  words = [word.strip("'") for line in file for word in line.split() if len(word) == 4]

with open("out", "w") as out:
  out.write('\n'.join(words) + '\n')

答案 3 :(得分:0)

由于您正在阅读以空格(或逗号)分隔的引用单词,因此您可以使用csv模块:

import csv

with open('/tmp/2let.txt','r') as fin, open('/tmp/out.txt','w') as fout:
    reader=csv.reader(fin,delimiter=' ',quotechar="'")
    source=(e for line in reader for e in line)             
    for word in source:
        if len(word)<=2:
            print(word)
            fout.write(word+'\n')

'out.txt':

i
4m
t0