Question

我有一个看起来像这样的字符串：

rawanswers = ["?\\n', 'WiFi\\n', 'Waf\\xef\\xac\\x82e House\\n', 'Wind\\n', ' \\n']"]

我想删除所有特殊字符（\n，问号和反斜杠），但请保留空格，数字和单引号。但是在每对单引号之间，我希望除去一个反斜杠之后的任何东西都被删除（直到遇到一个空格（然后它会重新启动））。然后，我希望将每对单引号之间的结果字符串放入新列表中。换句话说，我希望这是输出：

newlist = ['WiFi', 'Waf House', 'Wind']

我还想在多个不同的字符串上运行相同的代码来完成相同的任务。什么是最有效的方法？

Answer 1

rawanswers中的第一项并不是以单引号开头的，所以我在代码示例中添加了它。

rawanswers = ["'?\\n', 'WiFi\\n', 'Waf\\xef\\xac\\x82e House\\n', 'Wind\\n', ' \\n'"]

#Get first list item, strip off double quotes, split on commas. 
rawanswer = rawanswers[0].strip('"').split(',')

newList = []
for item in rawanswer:
    #Strip leading space, strip single quotes, split words.
    newStr = item.lstrip().strip("'").split()
    newItem = []
    for word in newStr:
        #Remove ?, split on '\\', get first list item and assume remainder is not wanted. 
        newWord = word.replace('?','').split('\\')[0]
        if newWord: newItem.append(newWord)  

    if newItem:
        newStr =  ' '.join(newItem)
        newList.append(newStr)

print newList

Answer 2

我可以想出一个简单的方法，分两部分来做这件事

import re

for string in rawanswers:
    string = re.sub(r'\\.','', string)  # Remove all \n \t etc..
    string = re.sub(r'[^\w\s]*','', string)  # Remove anything not a digit, letter, or space

如果你不想像你的例子那样想要数字，你可以在第二行将正则表达式更改为[A-Za-z]

Answer 3

我只能想到代码会删除完全符号组成的字符串，如@或%%，但不能删除＃或fed // n。这是：

words=['?','@a','a']
x=-1
while x<(len(words)-1):
  x=x+1
  if not words[x].isalnum():
    words.remove(words[x])
print(*words)

此外，在您的代码中，您从“？开头，但永远不会以”结尾“。这将导致代码错误。

删除除空格和数字之外的所有特殊字符并输出列表（Python）？

3 个答案: