演示

Question

我有一些带有一些前缀的字符串列表：

prefixes = [u'path', u'folder', u'directory', u'd']

和一些字符串如

s1 = u'path common path and directory'
s2 = u'directory common path and directory'
s3 = u'directory folder distinct and directory folder'
s4 = u'distinct and directory folder'
s5 = u'd fixable directory or folder'

我只需删除与列表中的一个匹配的前缀：

# after processing
s1 = u'common path and directory'
s2 = u'common path and directory'
s3 = u'folder distinct and directory folder'
s4 = u'distinct and directory folder'
s5 = u'fixable directory or folder'

我尝试使用

''.join([word for word in s1.split() if word not in prefixes])

或

for prefix in prefixes:
    if s1.startswith(prefix):
       return s1[len(prefix):]

但是这会删除字符串中任意位置的前缀或与整个单词不匹配（请注意我在那里有d，这与directory匹配，仅提供irectory），不仅在一开始。有没有办法不使用正则表达式？

Answer 1

如果您只想搜索整个单词，它们将被空格字符终止。我建议你将它附加到前缀：

prefixes = [u'path', u'folder', u'directory', u'd']

mystrings = [u'path common path and directory', u'directory common path and directory', u'directory folder distinct and directory folder', u'distinct and directory folder', u'd fixable directory or folder']
for s in mystrings:
    for prefix in prefixes: 
        if s.startswith(prefix+" "): 
            print s[len(prefix)+1:]

演示

>>> 
common path and directory
common path and directory
folder distinct and directory folder
fixable directory or folder

Answer 2

我会按" "分割得到第一个字，如果它在前缀列表中，则将其删除。

firstWord=s1.split(" ")[0]
if firstWord in prefixes:
   s1=" ".join(s1.split(" ")[1:])

您还可以使用split()

拆分所有空格

Answer 3

我建议partition或split限制他们为此做好准备。

prefixes = [u'path', u'folder', u'directory', u'd']
strings = [u'path common path and directory',
           u'directory common path and directory'
           u'directory folder distinct and directory folder',
           u'distinct and directory folder',
           u'd fixable directory or folder']

分区返回包含head，sep和tail的3个项目的元组。头是分隔符之前的所有内容，sep是分隔字符串的分隔符，尾部是后面的所有内容。用[2]索引它只会抓住尾巴。

res = []
for s in strings:
    s2 = s.partition(' ')
    if s2[0] in prefixes:
        res.append(s2[2])
    else:
        res.append(s)
print(res)

#List comp
print([s.partition(' ')[2] if s.partition(' ')[0] in prefixes else s for s in strings])

#Output for s1 | (head, sep, tail)
[0] | "path"
[1] | " "
[2] | "common path and directory"

使用限制进行拆分会生成一个列表，在该列表中，它仅在分隔符处分割指定的次数，然后添加剩余的所有内容。所以限制为1则长度最多为2。

res = []
for s in strings:
    s2 = s.split(' ', 1)
    if s2[0] in prefixes:
        res.append(s2[1])
    else:
        res.append(s)
print(res)

#List comp
print([s.split(' ', 1)[1] if s.split(' ', 1)[0] in prefixes else s for s in strings])

#Output for s1 | [first item, everything else]
[0] | "path"
[1] | "common path and directory"

Answer 4

您可以使用此功能

def func(s):                          
    pr = s.split()[0]
    if pr in prefixes:
        return ' '.join(s2.split()[1:])

这将获取第一个单词并查看它是否存在于prefixes中。如果是，则删除该单词。

如果匹配列表中的条目

4 个答案:

演示