Question

我正在使用看起来像这样的字符串（我从错误中保存了它）

"['This is one' 'How is two' 'Why is three'\n 'When is four'] not in index"

我想从这个字符串中提取出这样的子字符串

['This is one', 'How is two', 'Why is three', 'When is four']

到目前为止，我所做的就是获取子字符串（如果该字符串名为s）；

start = s.index("[") + len("[")
end = s.index("]")
s = s[start:end].replace("\\n", "")

哪个给我输出

'This is one' 'How is two' 'Why is three' 'When is four'

现在我只需要将它们插入列表中，这就是我遇到的问题。我已经尝试过了

s = s.split("'")

但是它给了我输出

['', 'This is one', ' ', 'How is two', ' ', 'Why is three', ' ', 'When is four', '']

我也尝试过

s = s.split("'")
s = ' '.join(s).split()

哪个给了我输出

['This', 'is', 'one', 'How', 'is', 'two', 'Why', 'is', 'three', 'When', 'is', 'four']

我尝试了相同的操作，但是.split(" ")给了我一些奇怪的空格。我也尝试过使用list(filter(...))，但它不会删除列表中包含空格的字符串，而只会删除完全为空的字符串。

Answer 1

一种方法是先提取方括号中的术语，然后使用re.findall查找所有用引号引起来的术语。

inp = "['This is one' 'How is two' 'Why is three'\n 'When is four'] not in index"
srch = re.search(r'\[(.*)\]', inp, flags=re.DOTALL)

if srch:
    matches = re.findall(r'\'(.*?)\'', srch.group(1))
    print(matches)

输出：

['This is one', 'How is two', 'Why is three', 'When is four']

在对re.search的调用中，请注意我们使用re.DOTALL模式。这是必需的，因为方括号中的内容实际上其中包含换行符。

将字符串追加到字符串列表

1 个答案: