Question

编辑：我应该补充一点，测试中的字符串应该包含所有可能的字符（即* + $§€/等）。所以我认为正则表达式应该最好。

我正在使用正则表达式查找某些字符（[“和”]之间的所有字符。我的示例如下：

test = """["this is a text and its supposed to contain every possible char."], 
    ["another one after a newline."], 

    ["and another one even with
    newlines

    in it."]"""

假定的输出应如下所示：

['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']

我的包含正则表达式的代码如下：

import re
my_list = re.findall(r'(?<=\[").*(?="\])*[^ ,\n]', test)
print (my_list)

我的结果如下：

['this is a text and its supposed to contain every possible char."]', 'another one after a newline."]', 'and another one even with']

所以有两个问题：

1）它不删除文本末尾的"]，因为我希望它与(?="\])一起使用

2）由于没有换行符，因此未捕获括号中的第三个文本。但是到目前为止，当我尝试.*\n时，我仍然无法捕获到这些内容，这给了我一个空字符串。

对于此问题的任何帮助或提示，我深表感谢。预先谢谢你。

Btw iam在anaconda-spyder和最新的regex（2018）上使用python 3.6。

编辑2：测试的一种更改：

test = """[
    "this is a text and its supposed to contain every possible char."
    ], 
    [
    "another one after a newline."
    ], 

    [
    "and another one even with
    newlines

    in it."
    ]"""

我再次无法从其中删除换行符，我猜想空格可以用\ s删除，所以我想像这样的正则表达式可以解决它。

my_list = re.findall(r'(?<=\[\S\s\")[\w\W]*(?=\"\S\s\])', test)
print (my_list)

但这只会返回一个空列表。如何从该输入获取上面的假定输出？

Answer 1

您可以尝试与此伴侣。

.*

Demo

您在正则表达式(?<=\[\")[\w\W]+?(?=\"\])中缺少的内容将与换行符不符。

PS 我没有匹配特殊字符。如果您愿意，可以非常轻松地实现。

这个也匹配特殊字符

Demo 2

Answer 2

如果您还可以接受不使用正则表达式的解决方案，则可以尝试

result = []
for l in eval(' '.join(test.split())):
    result.extend(l)

print(result)
#  ['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']

Answer 3

这就是我的想法：

test = """["this is a text and its supposed to contain every possible char."], 
    ["another one after a newline."], 

    ["and another one even with
    newlines

    in it."]"""

for i in test.replace('\n', '').replace('    ', ' ').split(','):
    print(i.lstrip(r' ["').rstrip(r'"]'))

这将导致以下内容打印到屏幕上

this is a text and its supposed to contain every possible char.
another one after a newline.
and another one even with newlines in it.

如果您想要这些-exact-字符串的列表，我们可以将其修改为-

newList = []
for i in test.replace('\n', '').replace('    ', ' ').split(','):
  newList.append(i.lstrip(r' ["').rstrip(r'"]'))

python regex-某些字符之间的字符

3 个答案: