Question

简而言之，我试图用空格替换行内单词中的任何标点符号。

例如，文本doc输出一旦处理就没有像这样的标点符号。

Meep Meep！我觉得我有点腻子。我做了我做了我做了一个腻子 tat Shsssssssssh我正在狩猎wabbits Heh Heh Heh Heh它是一个晴天狩猎whbits Heh Heh Heh停止它的wabbit 亨廷季节Huntin Wabbits最终导游101种方式 wabbit

没有改变，它看起来像这样。

来自question5.txt的文字

Meep Meep！我觉得我有点腻子。我做到了！我做到了！我做了一个腻子针锋相对。 Shsssssssssh ......我正在寻找wabbits。嘿嘿嘿嘿......这是一个晴天狩猎wabbits！ ......嘿嘿嘿......停 - 这是wabbit 亨廷赛季！ Huntin Wabbits：最终指南101方法 wabbit。

这是一个练习，所以我被告知使用.replace和for循环。

import string
infile = open('question5.txt', 'r')

lines = infile.readlines()
lines = str(lines)
for words in lines:
    for letters in words:
        letters.replace(string.punctuation,' ')
        print(letters)

非常感谢任何协助解决问题。

请注意您的建议和一些研究结果后，如果有人跟踪结果，我会在此之后结束更多时间。谢谢你们 Waves

import string
infile = open('question5.txt', 'r')
lines = infile.readlines()

def word_count(list):
    count = 0
    list = str(list)
    for lines in list:
        list = list.replace('.',' ')
        list = list.replace(',',' ')
        list = list.replace('-',' ')

    split = list.split()
    print (split)
    for words in split:
        count = count + 1
    return count


for line in lines:
    count = word_count(line)
    print(count)
infile.close()

Answer 1

这样更好：

import string as st

trans = st.maketrans(st.punctuation, ' '*len(st.punctuation))
with open('question5.txt', 'r') as f:
    for line in f:
        print line.translate(trans)

Answer 2

我不是百分百肯定，因为你的样本输出仍然包含一些标点符号 - 错字可能？

在Python 2.x中，您可以尝试以下操作，因为它实际上并不是用空格替换，而只是删除标点符号。

from string import punctuation
with open('question5.txt') as fin:
    test = fin.read()

new_text = test.translate(None, punctuation)

或者，使用正则表达式：

import re
new_text = re.sub('[' + re.escape(punctuation) + ']+', '', test)

仅使用循环的示例：

new_string = ''
for ch in old_string:
    if ch not in punctuation:
        new_string += ch

通过将punctuation置于集合中（或使用上述方法），可以提高效率

Answer 3

首先，作为elyase shows，您应该使用with构造，或者您应该在最后关闭该文件。此外，正如他所示，在阅读文本文件并动态处理时，您绝不应该使用.readlines()。只是循环遍历文件对象的内容。它是逐行迭代的（包括结尾\n）。

另一个问题是lines = str(lines)。实际上，您的lines最初是一个字符串列表。 str将其转换为单个字符串，其类似于"['Meep...', 'wabits...', 'huntin...']"。您首先循环遍历该字符串 - 获取单个字符（作为单字符串）。将其words命名并不会改变现实。（如果你真的想要删除这些词，你应该使用for word in line.split():之类的东西。）

然后你通过单个字符循环第二次 - 再次获得单个字符（即循环只转一次而不添加任何功能）。

接下来，.replace() 返回替换结果，但不会修改参数。您希望将结果分配给某个变量。无论如何，您不能使用string.punctuation作为旧字符串进行替换，因为它永远不会在源文本中找到。蛮力解决方案必须遍历标点字符串并替换单个字符。

总而言之，letters仍然包含单个字符 - 无替换。然后你打印单个字符。 print函数添加换行符。通过这种方式，您可以看到原始内容呈现为以中文方式编写的字符串/行列表的字符串表示形式 - 单列顶部/向下。

最后，the string.punctuation只是一个字符串常量。

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

您可以通过不导入string模块来简化代码（如果您不这样做），并使用您自己的字符串文字和应该被视为标点字符的字符。

在for循环中使用string.punctuation替换字母

3 个答案: