Question

我正在使用python和regex。我使用python读取了文件，并且想从文件中删除某些单词/字符。我正在使用re.sub()。这是字符串的示例：

Proxy BR 1.05s [HTTPS] 200.203.144.2:50262

我设法删除了单词和所有特殊字符，例如，

1.20 187.94.217.693128

但是我不能摆脱前4个字符。是1.05。

这是我的正则表达式：

pattern = "[a-zA-Z\[\],:<>]"

如何删除前4个字符？

Answer 1

使用锚（^代表字符串的开头，.{4}代表字符串的后四个字符）

import re

re.sub('^.{4}', '', '1.20 187.94.217.693128')

输出：

' 187.94.217.693128'

Answer 2

下面的代码仅在输入字符串中查找IPv4地址和端口号。 IP地址和端口号组合的格式为：

digit {1,3} .digit {1,3} .digit {1,3} .digit {1,3}：digit {1,5}

import re

with open('myproxy.txt', 'r') as input:
  lines = input.readlines()
  pattern_to_find = re.compile(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5})')

  for line in lines:
    find_pattern = re.search(pattern_to_find, line)
    if find_pattern:
        print(find_pattern.group())
        # outputs 
        104.248.168.64:3128
        54.81.69.91:3128
        78.60.130.181:30664
        80.120.86.242:46771
        109.74.135.246:45769
        198.50.172.161:1080
        103.250.166.12:47031
        88.255.101.244:8080

匹配字符串中的前4个字符

2 个答案: