Question

我尝试编写一个正则表达式来匹配由双引号（"）包围的eihter字符串或由空格（）分隔的单词，并将它们放在python的列表中。

我真的不明白我的代码的输出，有人可以给我一个提示或解释我的正则表达式究竟在做什么吗？

这是我的代码：

import re
regex = re.compile('(\"[^\"]*\")|( [^ ]* )')
test = '"hello world." here are some words. "and more"'
print(regex.split(test))

我期待这样的输出：

['"hello world."', ' here ', ' are ', ' some ', ' words. ', '"and more"']

但我得到以下内容：

['', '"hello world."', None, '', None, ' here ', 'are', None, ' some ', 'words.', None, ' "and ', 'more"']

空字符串和None来自何处。为什么它与"hello world."匹配，而不是"and more"。

感谢您的帮助，感谢今天庆祝它们的新年快乐！

修改
确切地说：我不需要周围的空间，但我需要周围的引号。这个输出也很好：

['"hello world."', 'here', 'are', 'some', 'words.', '"and more"']

EDIT2：

我最终使用shlex.split()像@PadraicCunningham建议的那样，因为它完全符合我的需要而且ihmo比正则表达式更具可读性。

我仍然保留@ TigerhawkT3的答案，因为它以我提出的方式解决问题（使用正则表达式）。

Answer 1

首先包含引用的匹配，以便优先考虑该匹配，然后是非空白字符：

>>> re.findall(r'".*?"|\S+', s)
['"hello world."', 'here', 'are', 'some', 'words.', '"and more"']

您可以使用非贪婪的重复模式而不是字符集否定获得相同的结果：

img {width: 100%; height: auto;}

Answer 2

带posix=False的{p> shlex.split会为您完成：

import shlex

test = '"hello world." here are some words. "and more"'

print(shlex.split(test,posix=False))
['"hello world."', 'here', 'are', 'some', 'words.', '"and more"']

如果您不想要引号，则将posix保留为True：

print(shlex.split(test))

['hello world.', 'here', 'are', 'some', 'words.', 'and more']

Answer 3

看起来像CSV，因此请使用相应的工具：

[['hello world.', 'here', 'are', 'some', 'words.', 'and more']]

返回

>>> s = '"hello world." here are some words. "and more"'
>>> re.findall(r'"[^"]*"|\S+', s)
['"hello world."', 'here', 'are', 'some', 'words.', '"and more"']

匹配引用的字符串和不带引号的单词

3 个答案: