Question

我有一个python列表：

list = ['clothing items s','shoes s','handbag d','fashion k']

我使用了一个for循环，它使用另一个列表从上面的列表中删除了单词。

我面临的挑战是围绕复数/单身的问题。这给我留下了随机的孤儿字母。

你知道如何遍历列表项并识别单个字母，如's'，'d'，'k'（在上面的例子中）并删除它们？虽然在示例中孤立位于字符串的末尾，但情况并非总是如此。

这是我目前的循环：

new_new_keywords = []

#first we start looping over every keyword
for keyword in new_keywords2:

    # loop over every stop
    for stop in new_stops:
        # check if this stop is inside the current new_key
        if stop in keyword:
            # if it is, update the new key to remove the current stop
            keyword = keyword.replace(stop, '')
            #regex removes numbers at the end of the string in the list
            keyword = re.sub(" \d+", " ", keyword)
    #loop over the keyword over and over again until
    #remove every stop word

    # append the new stop-less keyword to the end of the array
    # even if there are no changes
    new_new_keywords.append(keyword)

Answer 1

以下是一种相当老式（且效率低下）的方法应该可行。除了删除不需要的字符外，这将保留原始字符串：

test_list = ['clothing items s','shoes s','handbag d','fashion k', 'keep a', 'keep i', 'leave a alone remove k', 'keep ,  spacing b']

remove_list = "sdk"   # letters that need to be removed
newlist = []

for item in test_list:
    item += "_"     # append unused symbol to end of string

    for letter in remove_list:
        item = item.replace(" %s " % letter, "")
        item = item.replace(" %s_" % letter, "")

    newlist.append(item.rstrip("_"))

print newlist

它提供以下输出：

['clothing items', 'shoes', 'handbag', 'fashion', 'keep a', 'keep i', 'leave a alone remove', 'keep ,  spacing b']

如果在某些时候你选择给出正则表达式，那么可以使用以下方法实现类似的逻辑：

import re

test_list = ['clothing items s','shoes s','handbag d','fashion k', 'keep a', 'keep i', 'leave a alone remove k', 'keep ,  spacing b']

remove_list = "sdk"
newlist = [re.sub(" ([%s])( |$)" % remove_list, "", item) for item in test_list]

print newlist

Answer 2

一旦字符串长度为＆gt;您可以使用一个集来决定什么是以空格开头的无效结尾单个字母。 1，倒数第二个字母是一个空格，最后一个字母是在rm集合中然后切割字符串以删除字符，否则只需将字符串保持原样。：

lst = ['clothing items s','clothing s','shoes s','handbag d','fashion k']
rm = set((" bcdefghjklnpqrstuvwzy"))


print([ch[:-2] if all((len(ch) > 1,ch[-2].isspace(),ch[-1] in rm)) else ch 
      for ch in lst])
['clothing items', 'clothing', 'shoes', 'handbag', 'fashion']

您可以使用有效的字母来反转逻辑。

lst = ['clothing items s','clothing s','shoes s','handbag d','fashion k']
st = set("ioa")

print([ch[:-2] if all((len(ch) > 1,ch[-2].isspace(),ch[-1] not in st)) else ch
       for ch in lst])

您可能还希望在字符串上调用 str.lower I，而O在使用时应大写。< / p>

你可以再次使用rsplit和一个循环，你只需要决定是否只保留有效的单字母单词I，O，a，但这并不意味着你的句子语法正确：

lst = ['clothing items s', 'clothing s', 'shoes s', 'handbag d', 'fashion k']
rm = set("bcdefghjklnpqrstuvwzy")
out = []
for s in lst:
    spl = s.rsplit(None,1)
    if spl[-1] not in rm:
        out.append(s)
    else:
        out.append(s[:-2])

print(out)

或使用正则表达式：

lst = ['clothing items s', 'clothing s', 'shoes s', 'handbag d', 'fashion k']
import re

r = re.compile(r"\s[bcdefghjklnpqrstuvwzy]$")
print([r.sub("", ele) for ele in lst])
['clothing items', 'clothing', 'shoes', 'handbag', 'fashion']

即使考虑一个字母单词的可能性，你仍然需要查看句子是否语法正确，因为你需要使用类似nltk的内容，你可以添加一个小写的i和o重新设置或字母组以进一步过滤您的数据，但只有您可以决定相关的内容。如果你想要一个强大的解决方案并且句子在语法上是正确的，那么除了简单地删除字符串末尾的所有或某些单个尾随字母之外，还有很多工作。

Answer 3

取每个字符串s，将其拆分为单词w，然后重新组合s过滤掉只有一个字母的字词：

map(lambda s: ' '.join(w for w in s.split() if len(w) > 1), list)

Answer 4

直截了当的解决方案 - 它从最后一个元素开始删除单个字母的单词：

def trim(s):
    parts = s.split()
    while parts:
        if len(parts[-1]) == 1:
            del parts[-1]
        else:
            break
    return ' '.join(parts)


assert trim('clothing items s') == 'clothing items'
assert trim('fashion a b c') == 'fashion'
assert trim('stack overflow') == 'stack overflow'
assert trim('have a nice day') == 'have a nice day'
assert trim('a b c') == ''

使用Python删除列表中的孤立字母

4 个答案: