我正在尝试编写一个从一个文件读取输入的代码,用'xxxx'替换所有四个字母的单词,并将其写入另一个文件。我知道网站上已经出现了这个问题,我已经用谷歌搜索了其他问题,但它们都是一样的。我也玩过代码,但仍无法解决问题。
def censor(filename):
'string ==> None, creates file censored.txt in current folder with all 4 letter words replaces with string xxxx'
import string
infile = open(filename,'r')
infile2 = open('censored.txt','w')
for word in infile:
words = word.split()
for i, word in enumerate(words):
words.strip(string.punctuation)
if len(word) == 4:
words[i] == 'xxxx'
infile2.write(words[i])
我知道这只是一堆乱七八糟的代码,但我觉得值得张贴任何东西。我有一个想法是从文本中删除标点符号,这样它就不会将4个字母的单词统计为带标点符号的5个单词,将单词拆分为一个列表以更改四个字母单词,然后按原始顺序将它们连接在一起,只有替换了单词。所以“我喜欢工作。”最终会“我xxxx到xxxx。”
我还查看了该网站上的另一篇类似帖子,发现了一个可行的解决方案,但没有解决标点问题。
def maybe_replace(word, length=4):
if len(word) == length:
return 'xxxx'
else:
return word
def replacement(filename):
infile = open(filename,'r')
outfile = open('censored.txt','w')
for line in infile:
words = line.split()
newWords = [maybe_replace(word) for word in words]
newLine = ' '.join(newWords)
outfile.write(newLine + '\n')
outfile.close()
infile.close()
所以在这种情况下,如果我有一个单词列表,如“青蛙,靴子,猫,狗”。它会返回“Frog,boot,xxxx xxxx”
我还发现了另一种使用正则表达式的解决方案,但我仍然是新手,实际上无法理解该解决方案。任何帮助将不胜感激。
答案 0 :(得分:3)
正则表达式解决方案非常简单:
import re
text = """
I also found another solution using
regex, but I'm still a novice and
really can't understand that solution.
Any help would be appreciated.
"""
print re.sub(r'\b\w{4}\b', 'xxxx', text)
正则表达式匹配:
\b
,这是一个单词边界。它匹配单词的开头或结尾。\w{4}
匹配四个字符(a-z
,A-Z
,0-9
或_
)。\b
是另一个词边界。输出结果为:
I xxxx found another solution using
regex, but I'm still a novice and
really can't understand xxxx solution.
Any xxxx would be appreciated.
答案 1 :(得分:1)
您的第二段代码与words = line.split()
有问题。
默认情况下,它会在空格上分割,因此','被视为单词的一部分。
如果你真的不想触摸正则表达式,这是我的建议(还涉及一点正则表达式):
import re
words = re.split('[\W]+', line)
这要求python将行拆分为非字母数字字符。
答案 2 :(得分:0)
我们有答案! :)
import string as s
alfanum = s.ascii_letters + s.digits
def maybe_replace(arg, length=4):
word = ""
for t in arg: word += t if t in alfanum else ""
if len(word) == length:
if len(arg)>4: return 'xxxx'+arg[4:]
else: return 'xxxx'
else:
return arg
text = "Frog! boot, cat, dog. bye, bye!"
words = text.split()
print words
print [maybe_replace(word) for word in words]
>>> ['Frog!', 'boot,', 'cat,', 'dog.', 'bye,', 'bye!']
>>> ['xxxx!', 'xxxx,', 'cat,', 'dog.', 'bye,', 'bye!']