编辑:我应该补充一点,测试中的字符串应该包含所有可能的字符(即* + $§€/等)。所以我认为正则表达式应该最好。
我正在使用正则表达式查找某些字符([“和”]之间的所有字符。我的示例如下:
test = """["this is a text and its supposed to contain every possible char."],
["another one after a newline."],
["and another one even with
newlines
in it."]"""
假定的输出应如下所示:
['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']
我的包含正则表达式的代码如下:
import re
my_list = re.findall(r'(?<=\[").*(?="\])*[^ ,\n]', test)
print (my_list)
我的结果如下:
['this is a text and its supposed to contain every possible char."]', 'another one after a newline."]', 'and another one even with']
所以有两个问题:
1)它不删除文本末尾的"]
,因为我希望它与(?="\])
一起使用
2)由于没有换行符,因此未捕获括号中的第三个文本。但是到目前为止,当我尝试.*\n
时,我仍然无法捕获到这些内容,这给了我一个空字符串。
对于此问题的任何帮助或提示,我深表感谢。预先谢谢你。
Btw iam在anaconda-spyder和最新的regex(2018)上使用python 3.6。
编辑2:测试的一种更改:
test = """[
"this is a text and its supposed to contain every possible char."
],
[
"another one after a newline."
],
[
"and another one even with
newlines
in it."
]"""
我再次无法从其中删除换行符,我猜想空格可以用\ s删除,所以我想像这样的正则表达式可以解决它。
my_list = re.findall(r'(?<=\[\S\s\")[\w\W]*(?=\"\S\s\])', test)
print (my_list)
但这只会返回一个空列表。如何从该输入获取上面的假定输出?
答案 0 :(得分:1)
您可以尝试与此伴侣。
.*
您在正则表达式(?<=\[\")[\w\W]+?(?=\"\])
中缺少的内容将与换行符不符。
PS 我没有匹配特殊字符。如果您愿意,可以非常轻松地实现。
这个也匹配特殊字符
{{1}}
答案 1 :(得分:1)
如果您还可以接受不使用正则表达式的解决方案,则可以尝试
result = []
for l in eval(' '.join(test.split())):
result.extend(l)
print(result)
# ['this is a text and its supposed to contain every possible char.', 'another one after a newline.', 'and another one even with newlines in it.']
答案 2 :(得分:0)
这就是我的想法:
test = """["this is a text and its supposed to contain every possible char."],
["another one after a newline."],
["and another one even with
newlines
in it."]"""
for i in test.replace('\n', '').replace(' ', ' ').split(','):
print(i.lstrip(r' ["').rstrip(r'"]'))
这将导致以下内容打印到屏幕上
this is a text and its supposed to contain every possible char.
another one after a newline.
and another one even with newlines in it.
如果您想要这些-exact-字符串的列表,我们可以将其修改为-
newList = []
for i in test.replace('\n', '').replace(' ', ' ').split(','):
newList.append(i.lstrip(r' ["').rstrip(r'"]'))