正则表达式以多行匹配和替换字符串Python

时间:2019-04-02 07:58:21

标签: python regex

我需要帮助来匹配2个字符串并替换为空字符串''。感谢您的帮助,因为我仍然是Python和编码方面的新手。

crypto pki certificate chain TP-self-signed-1357590403
  +30820330 30820218 A0030201 02020101 300D0609 2A864886 F70D0101 05050030
  +31312F30 2D060355 04031326 494F532D 53656C66 2D536967 6E65642D 43657274
  +69666963 6174652D 31333537 35393034 3033301E 170D3139 30313234 31353436
  +34345A17 0D323030 31303130 30303030 305A3031 312F302D 06035504 03132649
  +4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 33353735
  +39303430 33308201 22300D06 092A8648 86F70D01 01010500 0382010F 00308201
  +0A028201 0100E69D C133454E 401E763A 7686E453 5D58020D 0E6E122F A0F19E15
  +E0975148 666110BD C1F09B86 CB701C20 EF85E024 F759A921 D11DA10C A13BA3BD
  +20006387 917287CE EA0CFDDC 2FA5DD07 E5B200F4 108CACA1 DCEF0E4E EEE908ED
  +2ACD693B FC90A24F 9F865CB9 859FEFB0 EB8904D4 8FA83D29 E93B892F 32F3EC7D
  +EAA2850E 1793BBCE 86EA47B2 15645634 D81EA89C 1C2BC092 766DF58F 0B289A82
  +0C92E551 7AA9588E F5B41A41 6DB4C785 101E674D BBBCFB42 9F4F9A25 70389515
  +D1C07E2F 18C0557D 95283E90 3CCD2966 5EBF5668 A6B0B847 0B278906 E5BFA668
  +EFBE938A BE70C4C0 1A8D7218 71463EA5 49540A45 DF307B4C 459E657D C039BB68
  +F047B0B2 2F250203 010001A3 53305130 0F060355 1D130101 FF040530 030101FF
  +301F0603 551D2304 18301680 141FADF3 CC2C2293 810EDAA8 9E55327C D2B7D88A
  +88301D06 03551D0E 04160414 1FADF3CC 2C229381 0EDAA89E 55327CD2 B7D88A88
  +300D0609 2A864886 F70D0101 05050003 82010100 91E63F44 376F91C1 C50C08E4
  +B29B902B B1BC7831 C5607897 030835A6 108FC1F2 6F3DEE23 EF3E8FFF 81A121B5
  +26596004 F8F61DFD 1B603C5D 42D850E6 439C7CAE BFC285AE 3FD83870 125594C0
  +51EAAC09 BF42446F C6399B90 D0E10ACA B208819B 645BECE5 DBDDA9AD EBA1FCD9
  +2B14D0DE AB2AC1BF FF064076 ADBB4540 17AB77A4 C6B0DA3B 1BC0F5B8 44030E7B
  +27318CEE 14C90739 DD8684A8 9346EEC1 3F4958EF 835BA822 F58523C9 E9F83105
  +D3E68700 20DAFC5E B1B8CF5B BAC5CEB3 00321088 43125173 51FC8006 270731E6
  +0E0C6183 68BABA99 BD9F4F28 1EDA82D4 F00F1359 F30B6501 BC468C89 49111AB2
  +CBDE5A9D DB8DB33A 45FE6C96 7D49A70F 4C299618

从第一行开始总是会有27行

第二个是:

crypto pki certificate chain TP-self-signed-1357590403
 -certificate self-signed 01 nvram:IOS-Self-Sig#1.cer

4 个答案:

答案 0 :(得分:2)

如果要匹配包括下一行的行,则可以匹配所有行,并使用否定的前行断言下一行不是以crypto开头。

然后将换行符和密码匹配到行尾:

^crypto pki certificate chain TP-self-signed-.*(?:\n(?!crypto).*)*\ncrypto.*

Regex demo

如果起始行应与结尾行相同,则可以将捕获组用于具有反向引用的第一行:

^(crypto pki certificate chain TP-self-signed-.*)(?:\n(?!\1).*)*\n\1

Regex demo

您的代码可能看起来像

pattern = r'^(crypto pki certificate chain TP-self-signed-.*)(?:\n(?!\1).*)*\n\1'
df=re.sub(pattern, '' , file, 0, re.MULTILINE)

答案 1 :(得分:1)

您可以使用以下代码:

import re

inputStr = """crypto pki certificate chain TP-self-signed-1357590403
  +30820330 30820218 A0030201 02020101 300D0609 2A864886 F70D0101 05050030
  +31312F30 2D060355 04031326 494F532D 53656C66 2D536967 6E65642D 43657274
  +69666963 6174652D 31333537 35393034 3033301E 170D3139 30313234 31353436
  +34345A17 0D323030 31303130 30303030 305A3031 312F302D 06035504 03132649
  +4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 33353735
  +39303430 33308201 22300D06 092A8648 86F70D01 01010500 0382010F 00308201
  +0A028201 0100E69D C133454E 401E763A 7686E453 5D58020D 0E6E122F A0F19E15
  +E0975148 666110BD C1F09B86 CB701C20 EF85E024 F759A921 D11DA10C A13BA3BD
  +20006387 917287CE EA0CFDDC 2FA5DD07 E5B200F4 108CACA1 DCEF0E4E EEE908ED
  +2ACD693B FC90A24F 9F865CB9 859FEFB0 EB8904D4 8FA83D29 E93B892F 32F3EC7D
  +EAA2850E 1793BBCE 86EA47B2 15645634 D81EA89C 1C2BC092 766DF58F 0B289A82
  +0C92E551 7AA9588E F5B41A41 6DB4C785 101E674D BBBCFB42 9F4F9A25 70389515
  +D1C07E2F 18C0557D 95283E90 3CCD2966 5EBF5668 A6B0B847 0B278906 E5BFA668
  +EFBE938A BE70C4C0 1A8D7218 71463EA5 49540A45 DF307B4C 459E657D C039BB68
  +F047B0B2 2F250203 010001A3 53305130 0F060355 1D130101 FF040530 030101FF
  +301F0603 551D2304 18301680 141FADF3 CC2C2293 810EDAA8 9E55327C D2B7D88A
  +88301D06 03551D0E 04160414 1FADF3CC 2C229381 0EDAA89E 55327CD2 B7D88A88
  +300D0609 2A864886 F70D0101 05050003 82010100 91E63F44 376F91C1 C50C08E4
  +B29B902B B1BC7831 C5607897 030835A6 108FC1F2 6F3DEE23 EF3E8FFF 81A121B5
  +26596004 F8F61DFD 1B603C5D 42D850E6 439C7CAE BFC285AE 3FD83870 125594C0
  +51EAAC09 BF42446F C6399B90 D0E10ACA B208819B 645BECE5 DBDDA9AD EBA1FCD9
  +2B14D0DE AB2AC1BF FF064076 ADBB4540 17AB77A4 C6B0DA3B 1BC0F5B8 44030E7B
  +27318CEE 14C90739 DD8684A8 9346EEC1 3F4958EF 835BA822 F58523C9 E9F83105
  +D3E68700 20DAFC5E B1B8CF5B BAC5CEB3 00321088 43125173 51FC8006 270731E6
  +0E0C6183 68BABA99 BD9F4F28 1EDA82D4 F00F1359 F30B6501 BC468C89 49111AB2
  +CBDE5A9D DB8DB33A 45FE6C96 7D49A70F 4C299618
crypto pki certificate chain TP-self-signed-1357590403"""

print(re.sub(r'crypto pki certificate chain TP-self-signed-\d+\s*[0-9a-fA-F+\s]+\s*crypto pki certificate chain TP-self-signed-\d+', '' , inputStr))

输出: empty

正则表达式演示https://regex101.com/r/G9XciA/2/

正则表达式说明:

  • crypto pki certificate chain TP-self-signed-\d+\s*与第一行匹配,该行的末尾仅被视为数字,后跟任何空格字符
  • [0-9a-fA-F+\s]+将匹配十六进制字符+和空格char
  • crypto pki certificate chain TP-self-signed-\d+\s*最后一行以结束匹配。如果第一行和最后一行的ID相同。

使用正则表达式:

crypto pki certificate chain TP-self-signed-(\d+)\s*[0-9a-fA-F+\s]+\s*crypto pki certificate chain TP-self-signed-\1

在向后引用第一个捕获组的地方

演示:https://regex101.com/r/G9XciA/3

答案 2 :(得分:1)

由于您没有提供所需结果的信息,因此无法确切知道您的追求,所以我们只能猜测。

如果您只想替换它,则可以使用

之类的东西。
from tkinter import *
import re

document_x = open('text.txt', encoding="utf8").read()

regex_test = re.sub(r".*\n*( +.*)*", "", document_x)

print(regex_test);

要使用

删除密码行之间的所有内容
regex_test = re.sub(r"(?:\n(?!crypto).*)*", "" , document_x)

或者要删除密码行本身,也可以使用

regex_test = re.sub("crypto pki certificate chain TP-self-signed-[0-9]+\n", "" , 
                     document_x, re.MULTILINE)

我已经通过python 3.6.1 shell进行了操作,以确认它们确实起作用。在线正则表达式测试器尽管有用,但并不总是返回与python本身相同的结果

可能的示例答案是

from tkinter import *
import re

document_x = open('text.csv', encoding="utf8").read()

regex_test = re.sub(r"(crypto[\s\S]*1357590403)", "", document_x)

print(regex_test);

您应该修改它以满足您的需求,这只是一个例子。 假设您要删除整个区块,但在EG之前或之后都不要删除

Placeholder 1
crypto pki certificate chain TP-self-signed-1357590403
  +30820330 30820218 A0030201 02020101 300D0609 2A864886 F70D0101 05050030
  +31312F30 2D060355 04031326 494F532D 53656C66 2D536967 6E65642D 43657274
  +69666963 6174652D 31333537 35393034 3033301E 170D3139 30313234 31353436
  +34345A17 0D323030 31303130 30303030 305A3031 312F302D 06035504 03132649
  +4F532D53 656C662D 5369676E 65642D43 65727469 66696361 74652D31 33353735
  +39303430 33308201 22300D06 092A8648 86F70D01 01010500 0382010F 00308201
  +0A028201 0100E69D C133454E 401E763A 7686E453 5D58020D 0E6E122F A0F19E15
  +E0975148 666110BD C1F09B86 CB701C20 EF85E024 F759A921 D11DA10C A13BA3BD
  +20006387 917287CE EA0CFDDC 2FA5DD07 E5B200F4 108CACA1 DCEF0E4E EEE908ED
  +2ACD693B FC90A24F 9F865CB9 859FEFB0 EB8904D4 8FA83D29 E93B892F 32F3EC7D
  +EAA2850E 1793BBCE 86EA47B2 15645634 D81EA89C 1C2BC092 766DF58F 0B289A82
  +0C92E551 7AA9588E F5B41A41 6DB4C785 101E674D BBBCFB42 9F4F9A25 70389515
  +D1C07E2F 18C0557D 95283E90 3CCD2966 5EBF5668 A6B0B847 0B278906 E5BFA668
  +EFBE938A BE70C4C0 1A8D7218 71463EA5 49540A45 DF307B4C 459E657D C039BB68
  +F047B0B2 2F250203 010001A3 53305130 0F060355 1D130101 FF040530 030101FF
  +301F0603 551D2304 18301680 141FADF3 CC2C2293 810EDAA8 9E55327C D2B7D88A
  +88301D06 03551D0E 04160414 1FADF3CC 2C229381 0EDAA89E 55327CD2 B7D88A88
  +300D0609 2A864886 F70D0101 05050003 82010100 91E63F44 376F91C1 C50C08E4
  +B29B902B B1BC7831 C5607897 030835A6 108FC1F2 6F3DEE23 EF3E8FFF 81A121B5
  +26596004 F8F61DFD 1B603C5D 42D850E6 439C7CAE BFC285AE 3FD83870 125594C0
  +51EAAC09 BF42446F C6399B90 D0E10ACA B208819B 645BECE5 DBDDA9AD EBA1FCD9
  +2B14D0DE AB2AC1BF FF064076 ADBB4540 17AB77A4 C6B0DA3B 1BC0F5B8 44030E7B
  +27318CEE 14C90739 DD8684A8 9346EEC1 3F4958EF 835BA822 F58523C9 E9F83105
  +D3E68700 20DAFC5E B1B8CF5B BAC5CEB3 00321088 43125173 51FC8006 270731E6
  +0E0C6183 68BABA99 BD9F4F28 1EDA82D4 F00F1359 F30B6501 BC468C89 49111AB2
  +CBDE5A9D DB8DB33A 45FE6C96 7D49A70F 4C299618
crypto pki certificate chain TP-self-signed-1357590403
Placeholder 2

运行上面的示例,返回结果将删除该块,并保留周围的内容,即。

Placeholder 1

Placeholder 2

答案 3 :(得分:1)

为什么不只使用此正则表达式,

(crypto pki certificate chain TP-self-signed-\d+)[\w\W]+?\1

并使用空字符串将其删除?

我是否遗漏了一点,因为其他答案似乎暗示了涉及换行符的较复杂解决方案?

Demo

编辑:根据您的评论“实际上,我实际上需要删除:crypto pki证书链TP-self-signed-1357590403以及接下来以+开头的26行”

您可以使用此正则表达式从+行之后的crypto pki certificate chain TP-self-signed-1357590403开始精确选择26行。

crypto pki certificate chain TP-self-signed-\d+(?:\n\s*\+[^\n]*){26}

Demo

如在演示中所见,它仅选择+开头的26行,并用空字符串将其删除。让我知道您是否遇到任何问题。