查找替换所有匹配正则表达式的字符串

时间:2017-09-15 08:56:05

标签: python regex

我目前正在使用以下代码将所有'转换为':

    #Converts ' to "
    lines = []
    replacements = {"'":'"'}

    with open('netstat_data_IP_formatted.json') as infile:
        for line in infile:
            for src, target in replacements.iteritems():
                line = line.replace(src, target)
            lines.append(line)
    with open('netstat_data_IP_formatted.json', 'w') as outfile:
        for line in lines:
            outfile.write(line)

,这工作正常,但我想选择所有端口并删除端口周围的所有格式,所以使用像这样的工作的正则表达式,而不是拿起IP地址?,

^([0-9]{1,4}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])$

然后我如何删除端口号码周围的格式,或者是否有更简单的方法来制作这个?

所以这是输入

{'l_port': '48856', 'r_host': '95.211.210.72', 'r_port': '443', 'state': 'ESTAB$
{'l_port': '443', 'r_host': '37.218.247.217', 'r_port': '35805', 'state': 'TIME$
{'l_port': '48662', 'r_host': '95.211.210.72', 'r_port': '443', 'state': 'ESTAB$
{'l_port': '51316', 'r_host': '91.194.90.103', 'r_port': '443', 'state': 'ESTAB$

这是脚本运行后的方式

{"l_port": "48698", "r_host": "95.211.210.72", "r_port": "443", "state": "ESTAB$
{"l_port": "40406", "r_host": "178.62.252.82", "r_port": "443", "state": "TIME_$
{"l_port": "443", "r_host": "60.191.48.203", "r_port": "58220", "state": "SYN_R$
{"l_port": "36058", "r_host": "37.252.185.87", "r_port": "443", 'state': 'TIME_$

这就是我想要的方式

{"l_port": 48698, "r_host": "95.211.210.72", "r_port": 443, "state": "ESTAB$
{"l_port": 40406, "r_host": "178.62.252.82", "r_port": 443, "state": "TIME_$
{"l_port": 443, "r_host": "60.191.48.203", "r_port": 58220, "state": "SYN_R$
{"l_port": 36058, "r_host": "37.252.185.87", "r_port": 443, 'state': 'TIME_$

1 个答案:

答案 0 :(得分:0)

您只需要将数字与引号相匹配但不包含句号:

re.sub("'([0-9]+)'",'\\1',line)

(当然,如果先更换它们,请交换引号。)

那就是说,如果你需要做任何更复杂的事情,你应该使用标准的json包来实际解析文件,然后根据你的喜好格式化数据。