Question

我有以下正则表达式，当没有前导/ d时有效，“系统上有1个接口：

或尾随“，2017-01 -...

这是正则表达式：

1,"There is 1 interface on the system:
    Name               : Mobile Broadband Connection
    Description        : Qualcomm Gobi 2000 HS-USB Mobile Broadband Device 250F
    GUID               : {1234567-12CD-1BC1-A012-C1A1234CBE12}
    Physical Address   : 00:a0:c6:00:00:00
    State              : Connected
    Device type        : Mobile Broadband device is embedded in the system
    Cellular class     : CDMA
    Device Id          : A1000001234f67
    Manufacturer       : Qualcomm Incorporated
    Model              : Qualcomm Gobi 2000
    Firmware Version   : 09010091
    Provider Name      : Verizon Wireless
    Roaming            : Not roaming
    Signal             : 67%",2017-01-20T16:00:07.000-0700

以下是我要解析的内容示例：

1,"There is 1 interface on the system:  (where 1 increments 1,2 3,4 and so on

我正在尝试提取字段名称，例如，Cellular类等于CDMA，但是从之后的所有字段开始：

axf:physical-page-number

在拖尾之前“，2017-01 ....

非常感谢任何帮助！

Answer 1

您可以使用预测来确保您匹配的字符串位于",\d序列之前，并且不包含"。后者将确保您只匹配双引号，其中第二个具有模式",\d：

/^\h*(?<_KEY_1>[\w\h]+?)\h*:\h*(?<_VAL_1>[^\r\n"]+)(?="|$)(?=[^"]*",\d)/gm

在regex101

上查看

注意：我将g和m修饰符放在最后，但如果您的环境在开头时需要(?m)符号，那么当然也会有效。

Answer 2

您的示例字符串似乎是来自csv文件的记录。这就是我用Python（2.7或3.x）完成任务的方法：

import csv

with open('file.csv', 'r') as fh:
    reader = csv.reader(fh)
    results = []

    for fields in reader:
        lines = fields[1].splitlines()
        keyvals = [list(map(str.strip, line.split(':', 1))) for line in lines[1:]]
        results.append(keyvals)

    print(results)

可以用与其他语言类似的方式完成。

Answer 3

你没有回复我的评论或任何答案，但这是我的答案 - 试试

^\s*(?<_KEY_1>[\w\s]+?)\s*:\s*(?<_VAL_1>[^\r\n"]+).*$

See it here at regex101

正则表达式找到介于两者之间的一切

3 个答案: