Question

我正在尝试将项目从一个列表压缩到另一个列表，我需要能够将标点符号保存为列表中的单独项目，因为如果我不这样做，那么＃34;你＆＃34;和＃34;你;＆＃34;被保存为列表中的单独项目。

例如，原始列表是，

['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President.']

目前是压缩列表，

['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'you', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'former', 'American', 'President.']

但我希望它将标点符号作为列表中的单独项目。

我的预期输出是，

['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you', ';', 'ask', '!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'former', 'American', 'President', '.']

Answer 1

您可以使用regex实施。

import re
a = ['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you;', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President.']
result = re.findall(r"[\w']+|[.,!?;]",' '.join(a))

<强>输出

['Ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you', ';', 'ask', 'what', 'you', 'can', 'do', 'for', 'your', 'country', '!', 'This', 'is', 'a', 'quote', 'from', 'JFK', 'who', 'is', 'a', 'former', 'American', 'President', '.']

以下是了解有关regex的更多信息的演示。

Answer 2

这是分隔非字母字符并删除重复字符的代码。希望它有所帮助。

def separate(mylist):
    newlist = [] 
    test = ''
    a = ''
    for e in mylist:
        for c in e:   
            if not c.isalpha():
                a = c
            else:
                test = test + c
        if a != '':
            newlist = newlist + [test] + [a]
        else:
            newlist = newlist + [test]
        test = ''
        a = ''
    noduplicates = []
    for i in newlist:
        if i not in noduplicates:
            noduplicates = noduplicates + [i]
    return noduplicates

我确信别人可以做得更好，这有点乱，但至少有效。

如何从列表中的项目中删除标点符号并将其另存为列表中的单独项目？

2 个答案: