Question

我有一个内容为

的foo.txt文件

'w3ll' 'i' '4m' 'n0t' '4sed' 't0' 

'it'

我试图提取其中包含2个字符的所有单词。我的意思是，输出文件应该只有

4m
t0
it

我尝试的是，

with open("foo.txt" , 'r') as foo:
    listme = foo.read()

string =  listme.strip().split("'")

我想这将用'符号分割字符串。如何只选择那些字符数等于2的撇号中的字符串？

Answer 1

这应该有效：

>>> with open('abc') as f, open('output.txt', 'w') as f2:
...     for line in f:
...         for word in line.split():    #split the line at whitespaces
...             word = word.strip("'")   # strip out `'` from each word
...             if len(word) == 2:       #if len(word) is 2 then write it to file
...                 f2.write(word + '\n')

print open('output.txt').read()
4m
t0
it

使用regex：

>>> import re
>>> with open('abc') as f, open('output.txt', 'w') as f2:
    for line in f:
        words = re.findall(r"'(.{2})'",line)
        for word in words:
            f2.write(word + '\n')
...             
>>> print open('output.txt').read()
4m
t0
it

Answer 2

鉴于您要查找''符号中包含的所有单词，这两个字符的长度恰好是两个字符：

import re
split = re.compile(r"'\w{2}'")

with open("file2","w") as fw:
    for word in split.findall(open("file","r").read()):
            fw.write(word.strip("'")+"\n")

Answer 3

with open("foo.txt" , 'r') as file:
  words = [word.strip("'") for line in file for word in line.split() if len(word) == 4]

with open("out", "w") as out:
  out.write('\n'.join(words) + '\n')

Answer 4

由于您正在阅读以空格（或逗号）分隔的引用单词，因此您可以使用csv模块：

import csv

with open('/tmp/2let.txt','r') as fin, open('/tmp/out.txt','w') as fout:
    reader=csv.reader(fin,delimiter=' ',quotechar="'")
    source=(e for line in reader for e in line)             
    for word in source:
        if len(word)<=2:
            print(word)
            fout.write(word+'\n')

'out.txt'：

i
4m
t0

从文件中过滤特定长度的字符串

4 个答案: