如何从字符串中提取逗号分隔的子字符串?

时间:2019-05-25 18:42:25

标签: regex regex-lookarounds regex-group regex-greedy python-textfsm

需要解析以逗号分隔的算法。

SSH Enabled - version 2.0
Authentication methods:publickey,keyboard-interactive,password
Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc
MAC Algorithms:hmac-sha1,hmac-sha1-96
Authentication timeout: 120 secs; Authentication retries: 3
Minimum expected Diffie Hellman key size : 1024 bits
IOS Keys in SECSH format(ssh-rsa, base64 encoded):

我试图用逗号分隔它们,但没有得到预期的结果:

^Encryption Algorithms:(.*?)(?:,|$)

预期结果是第1组中的每个算法都没有空组

aes128-ctr
aes192-ctr
aes256-ctr
aes128-cbc
3des-cbc
aes192-cbc
aes256-cbc

2 个答案:

答案 0 :(得分:1)

这可能不是最好的方法,但是它可能是将字符串分成三部分的一种方法,甚至在通过RegEx引擎运行之前也是如此。如果不是这种情况,而我们希望有一个表达式,则可能很接近:

(.+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC.+)

enter image description here


如果您还有新行,则可能要用其他表达式进行测试,也许类似于:

([\s\S]+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC[\s\S]+)

([\w\W]+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC[\w\W]+)

([\d\D]+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC[\d\D]+)

Demo 1

Demo 2

RegEx

如果不需要此表达式,可以在regex101.com中对其进行修改或更改。

RegEx电路

jex.im可视化正则表达式:

enter image description here

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([\w\W]+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC[[\w\W]+)"

test_str = ("SSH Enabled - version 2.0\n"
    "Authentication methods:publickey,keyboard-interactive,password\n"
    "Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc\n"
    "MAC Algorithms:hmac-sha1,hmac-sha1-96\n"
    "Authentication timeout: 120 secs; Authentication retries: 3\n"
    "Minimum expected Diffie Hellman key size : 1024 bits\n"
    "IOS Keys in SECSH format(ssh-rsa, base64 encoded):\n")

subst = "\\2 "

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

演示

const regex = /(.+Encryption Algorithms:)|([a-z0-9-]+)(?:,|\s)|(MAC.+)/gm;
const str = `SSH Enabled - version 2.0 Authentication methods:publickey,keyboard-interactive,password Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc MAC Algorithms:hmac-sha1,hmac-sha1-96 Authentication timeout: 120 secs; Authentication retries: 3 Minimum expected Diffie Hellman key size : 1024 bits IOS Keys in SECSH format(ssh-rsa, base64 encoded):`;
const subst = `$2 `;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

答案 1 :(得分:1)

另一种方法是匹配以Encryption Algorithms:开头的字符串,然后在组中捕获一个重复的模式,该模式将带有连字符的部分与之匹配,并以逗号开头。

如果有匹配项,则可以用逗号分割第一个捕获组。

^Encryption Algorithms:(\w+-\w+(?:,\w+-\w+)*)

说明

  • ^
  • Encryption Algorithms:
  • (开始捕获组
    • \w+-\w+匹配1个以上的字符,-和1个以上的字符
    • (?:,\w+-\w+)* 0+次重复逗号,后跟1+个单词字符,-和1+个单词字符
  • )关闭捕获组

Regex demo | Python demo

import re
regex = r"^Encryption Algorithms:(\w+-\w+(?:,\w+-\w+)*)"
test_str = ("SSH Enabled - version 2.0\n"
            "Authentication methods:publickey,keyboard-interactive,password\n"
            "Encryption Algorithms:aes128-ctr,aes192-ctr,aes256-ctr,aes128-cbc,3des-cbc,aes192-cbc,aes256-cbc\n"
            "MAC Algorithms:hmac-sha1,hmac-sha1-96\n"
            "Authentication timeout: 120 secs; Authentication retries: 3\n"
            "Minimum expected Diffie Hellman key size : 1024 bits\n"
            "IOS Keys in SECSH format(ssh-rsa, base64 encoded):")

matches = re.search(regex, test_str, re.MULTILINE)
if matches:
    print(matches.group(1).split(","))

结果:

['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'aes128-cbc', '3des-cbc', 'aes192-cbc', 'aes256-cbc']