Question

我的输入字符串包含各种实体，如下所示：的 conn_type：//主机：端口/模式＃登录＃密码

我想在python中使用正则表达式找出所有这些。

截至目前，我可以逐个找到它们，比如

conn_type=re.search(r'[a-zA-Z]+',test_string)
  if (conn_type):
    print "conn_type:", conn_type.group()
    next_substr_len = conn_type.end()
    host=re.search(r'[^:/]+',test_string[next_substr_len:])

等等。

有没有办法没有if和else ？我希望有某种方式，但无法找到它。请注意，每个实体正则表达式都不同。

请帮助，我不想写一个无聊的代码。

Answer 1

为什么不使用re.findall？

以下是一个例子：

import re;

s = 'conn_type://host:port/schema#login#password asldasldasldasdasdwawwda conn_type://host:port/schema#login#email';

def get_all_matches(s):
    matches = re.findall('[a-zA-Z]+_[a-zA-Z]+:\/+[a-zA-Z]+:+[a-zA-Z]+\/+[a-zA-Z]+#+[a-zA-Z]+#[a-zA-Z]+',s);
    return matches;

print get_all_matches(s);

这将返回一个完整匹配的列表到当前的正则表达式，如本例中所示，在这种情况下将是：

['conn_type://host:port/schema#login#password', 'conn_type://host:port/schema#login#email']

如果您需要帮助在Python中制作正则表达式模式，我建议您使用以下网站：

A pretty neat online regex tester

另请参阅re模块的文档以获取有关re.findall

的更多信息

Documentation for re.findall

希望这有帮助！

Answer 2

如果您喜欢DIY，请考虑创建tokenizer。这是非常优雅的“python方式”解决方案。

或者使用标准的lib：https://docs.python.org/3/library/urllib.parse.html但请注意，您的示例网址不是完全有效的：没有架构'conn_type'，并且您在查询字符串中有两个锚点，因此urlparse不会按预期工作。但对于现实生活中的URL，我强烈推荐这种方法。

Answer 3

>>>import re
>>>uri = "conn_type://host:port/schema#login#password"
>>>res = re.findall(r'(\w+)://(.*?):([A-z0-9]+)/(\w+)#(\w+)#(\w+)', uri)
>>>res
[('conn_type', 'host', 'port', 'schema', 'login', 'password')]

不需要ifs。使用findall或finditer搜索您的连接类型集合。根据需要过滤元组列表。

在python中使用regex在字符串中查找多个内容

3 个答案: