python regex-某些字符之间的字符

时间:2018-12-04 11:21:51

标签: python regex char newline lookahead

编辑:我应该补充一点,测试中的字符串应该包含所有可能的字符(即* + $§€/等)。所以我认为正则表达式应该最好。

我正在使用正则表达式查找某些字符([“和”]之间的所有字符。我的示例如下:

test = """["this is a text and its supposed to contain every possible char."], 
    ["another one after a newline."], 

    ["and another one even with
    newlines

    in it."]"""

假定的输出应如下所示:

['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']

我的包含正则表达式的代码如下:

import re
my_list = re.findall(r'(?<=\[").*(?="\])*[^ ,\n]', test)
print (my_list)

我的结果如下:

['this is a text and its supposed to contain every possible char."]', 'another one after a newline."]', 'and another one even with']

所以有两个问题:

1)它不删除文本末尾的"],因为我希望它与(?="\])一起使用

2)由于没有换行符,因此未捕获括号中的第三个文本。但是到目前为止,当我尝试.*\n时,我仍然无法捕获到这些内容,这给了我一个空字符串。

对于此问题的任何帮助或提示,我深表感谢。预先谢谢你。

Btw iam在anaconda-spyder和最新的regex(2018)上使用python 3.6。

编辑2:测试的一种更改:

test = """[
    "this is a text and its supposed to contain every possible char."
    ], 
    [
    "another one after a newline."
    ], 

    [
    "and another one even with
    newlines

    in it."
    ]"""

我再次无法从其中删除换行符,我猜想空格可以用\ s删除,所以我想像这样的正则表达式可以解决它。

my_list = re.findall(r'(?<=\[\S\s\")[\w\W]*(?=\"\S\s\])', test)
print (my_list)

但这只会返回一个空列表。如何从该输入获取上面的假定输出?

3 个答案:

答案 0 :(得分:1)

您可以尝试与此伴侣。

.*

Demo

您在正则表达式(?<=\[\")[\w\W]+?(?=\"\])中缺少的内容将与换行符不符。

PS 我没有匹配特殊字符。如果您愿意,可以非常轻松地实现。

这个也匹配特殊字符

{{1}}

Demo 2

答案 1 :(得分:1)

如果您还可以接受不使用正则表达式的解决方案,则可以尝试

result = []
for l in eval(' '.join(test.split())):
    result.extend(l)

print(result)
#  ['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']

答案 2 :(得分:0)

这就是我的想法:

test = """["this is a text and its supposed to contain every possible char."], 
    ["another one after a newline."], 

    ["and another one even with
    newlines

    in it."]"""

for i in test.replace('\n', '').replace('    ', ' ').split(','):
    print(i.lstrip(r' ["').rstrip(r'"]'))

这将导致以下内容打印到屏幕上

this is a text and its supposed to contain every possible char.
another one after a newline.
and another one even with newlines in it.

如果您想要这些-exact-字符串的列表,我们可以将其修改为-

newList = []
for i in test.replace('\n', '').replace('    ', ' ').split(','):
  newList.append(i.lstrip(r' ["').rstrip(r'"]'))