Question

我有一本字典，键的值为

https://service-dmn1-region.com/info 4169 description

我有兴趣从该URL部分获取dmn1-region并按原样打印4169描述。所以我打算将结果打印为：

dmn1-region：4169说明

如果没有复杂的正则表达式/正则表达式，您是否认为可能？该脚本在python中进行了尝试-

import re print re.sub('https://','',dictionary[key])

这只是删除https：//部分，并将结果显示为service-dmn1-region.com/info 4169 description。但我不确定如何实现上述预期方式。

字典中的

键-值对看起来像-

dictionary = {'service': 'https://service-dmn1-region.com/info 4169 description',
'service1': 'https://service1-dmn2-region2.com/info 5123 someDescription','service2': 'https://dmn1-region-service2.com/info'}

任何见解和帮助都非常感谢。

Answer 1

鉴于信息以及您不想使用正则表达式的事实，您可以执行以下操作：

dictionary = {'service': 'https://service-dmn1-region.com/info 4169 description',
              'service1': 'https://service1-dmn2-region2.com/info 5123 someDescription'}


def extract(key, s):
    info = '/info'
    service = key + '-'
    return s[s.find('service') + len(service):s.find('.com')], s[s.find(info) + len(info):].strip()


for key, value in dictionary.items():
    region, info = extract(key, value)
    print('{0}:{1}'.format(region, info))

输出

dmn2-region2:5123 someDescription
dmn1-region:4169 description

请注意，网址是字典的值，而不是键。

Answer 2

我会使用类似的东西：

import re
for k, v in dictionary.items(): # .iteritems() for py2
    print(re.sub(r"^.*?{}-([^.]+).*?(\d+)\s(.*?)$".format(k), r"\1 :\2 \3", v))

dmn1-region :4169 description
dmn2-region2 :5123 someDescription

DEMO

Answer 3

对于类型https://service-dmn1-region.com/info 4169 description

的值

您可以在^[^-]+-([^.]+)[^\s]+ (.*)$上匹配

[harald@localhost ~]$ python3
Python 3.6.6 (default, Jul 19 2018, 14:25:17) 
[GCC 8.1.1 20180712 (Red Hat 8.1.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> val = 'https://service-dmn1-region.com/info 
4169 description'
>>> res = re.match('^[^-]+-([^.]+)[^\s]+ (.*)$', val)
>>> res.group(1)
'dmn1-region'
>>> res.group(2)
'4169 description'

其中^[^-]+从输入的开头（初始^）开始，匹配不是撇号-（[^-]+）的所有内容，因此https://service

接下来，您指定必须有一个撇号来跟随^[⁻]+-，并且您希望捕获接下来的所有内容都不会包含([^.]+)。（正如您现在所猜测的那样，用^排除模式会否定它，并且该模式写在方括号[]中。

将我们引向^[^-]+-([^.]+)，接下来，您希望忽略下一个空格之前的所有内容，因为这会将其他值与字符串分开，因此您为所有非空格（\ s）添加了模式匹配项，从而导致了额外的{ {1}}，所以[^\s]+

然后您希望由空格分隔符跟进（如果期望超过1个空格，则可以使用^[^-]+-([^.]+)[^\s]+而不是实际空间），然后添加最终的全部捕获模式\s* ，它将捕获(.*)（点代表此处的所有字符），直到输入4169 description的结尾将您引向$。

如何在Python中从字典值中提取部分url？

3 个答案: