Question

我有一批我需要削减的字符串。它们基本上是一个描述符，后跟代码。我只想保留描述符。

'a descriptor dps 23 fd'
'another 23 fd'
'and another fd'
'and one without a code'

以上代码为dps，23和fd。它们可以按任何顺序排列，彼此无关，可能根本不存在（如最后一种情况）。

代码列表是固定的（或至少可以预测），因此假设代码从未在合法描述符中使用，如何在代码的第一个实例之后剥离所有代码。

我正在使用Python。

Answer 1

简短的回答，正如@ THC4K在评论中指出的那样：

string.split(pattern, 1)[0]

其中string是您的原始字符串，pattern是您的“中断”模式，1表示分割不超过1次，[0]表示第一次分裂返回的元素。

行动中：

>>> s = "a descriptor 23 fd"
>>> s.split("23", 1)[0]
'a descriptor '
>>> s.split("fdasfdsafdsa", 1)[0]
'a descriptor 23 fd'

这是表达我之前所写内容的一种更短的方式，无论如何我会留在这里。

如果你需要删除多个模式，这是reduce内置的一个很好的候选者：

>>> string = "a descriptor dps foo 23 bar fd quux"
>>> patterns = ["dps", "23", "fd"]
>>> reduce(lambda s, pat: s.split(pat, 1)[0], patterns, string)
'a descriptor '
>>> reduce(lambda s, pat: s.split(pat, 1)[0], patterns, "uiopuiopuiopuipouiop")
'uiopuiopuiopuipouiop'

这基本上说明了：对于pat中的每个patterns：取string并重复应用string.split(pat, 1)[0]（如上所述），对先前返回的值的结果进行操作每一次。如您所见，如果字符串中没有任何模式，则仍会返回原始字符串。

最简单的答案是列表/字符串切片与string.find结合使用：

>>> s = "a descriptor 23 fd"
>>> s[:s.find("fd")]
'a descriptor 23 '
>>> s[:s.find("23")]  
'a descriptor '
>>> s[:s.find("gggfdf")] # <-- look out! last character got cut off
'a descriptor 23 f'

更好的方法（避免在s.find返回-1时切断丢失模式中的最后一个字符）可能是一个简单的函数：

>>> def cutoff(string, pattern):
...     idx = string.find(pattern)
...     return string[:idx if idx != -1 else len(string)]
... 
>>> cutoff(s, "23")
'a descriptor '
>>> cutoff(s, "asdfdsafdsa")
'a descriptor 23 fd'

[:s.find(x)]语法表示从索引0到冒号右侧的字符串部分;在这种情况下，RHS是s.find的结果，它返回您传递的字符串的索引。

Answer 2

你似乎在描述这样的事情：

def get_descriptor(text):
    codes = ('12', 'dps', '23')
    for c in codes:
        try:
            return text[:text.index(c)].rstrip()
        except ValueError:
            continue

    raise ValueError("No descriptor found in `%s'" % (text))

如，

>>> get_descriptor('a descriptor dps 23 fd')
'a descriptor'

Answer 3

codes = ('12', 'dps', '23')

def get_descriptor(text):
    words = text.split()
    for c in codes:
        if c in words:
            i = words.index(c)
            return " ".join(words[:i])
    raise ValueError("No code found in `%s'" % (text))

Answer 4

我可能会使用正则表达式来执行此操作：

>>> import re
>>> descriptors = ('foo x', 'foo y', 'bar $', 'baz', 'bat')
>>> data = ['foo x 123', 'foo y 123', 'bar $123', 'baz 123', 'bat 123', 'nothing']
>>> p = re.compile("(" + "|".join(map(re.escape, descriptors)) + ")")
>>> for s in data:
        m = re.match(p, s)
        if m: print m.groups()[0]
foo x
foo y
bar $
baz
bat

我不清楚你是否想要提取的内容包括描述符之前的文本，或者你是否希望每行文本都以描述符开头;以上涉及后者。对于前者，只需稍微更改模式，使其在第一次出现描述符之前捕获所有字符：

>>> p = re.compile("(.*(" + "|".join(map(re.escape, descriptors)) + "))")

Answer 5

这是一个适用于所有代码的答案，而不是强迫您为每个代码调用函数，并且比上面的一些答案稍微简单一些。它也适用于您的所有示例。

strings = ('a descriptor dps 23 fd', 'another 23 fd', 'and another fd',
                  'and one without a code')
codes = ('dps', '23', 'fd')

def strip(s):
    try:
        return s[:min(s.find(c) for c in codes if c in s)]
    except ValueError:
        return s

print map(strip, strings)

输出：

['a descriptor ', 'another ', 'and another ', 'and one without a code']

我相信这符合您的所有标准。

编辑：如果你不喜欢期待异常，我很快意识到你可以删除try catch：

def strip(s):
    if not any(c in s for c in codes):
        return s
    return s[:min(s.find(c) for c in codes if c in s)]

Answer 6

    def crop_string(string, pattern):
        del_items = []
        for indx, val in enumerate(pattern):
            a = string.split(val, 1)
            del_items.append(a[indx])

        for del_item in del_items:
            string = string.replace(del_item, "")
        return string

示例：

我想裁剪字符串并仅从中取出数组。

strin = "crop the array [1,2,3,4,5]
pattern["[","]"]

用法：

a = crop_string(strin ,pattern )
print a 

# --- Prints "[1,2,3,4,5]"

在某个短语之后切一个字符串？

6 个答案: