打开文件,阅读内容,使用正则表达式将内容制作成一个列表,然后在python

时间:2015-10-29 19:34:50

标签: python regex

我正在使用“import re and sys”

在终端上,当我输入“1.py a.txt”时 我希望它读取“a.txt”,其中包含以下内容:

17:18:42.525964 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1:1449, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448
17:18:42.526623 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1449:2897, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448
17:18:42.526900 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 2897, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.527694 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 2897:14481, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 11584
17:18:42.527716 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 14481, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.528794 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 14481:23169, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 8688
17:18:42.528813 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 23169, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.545191 IP 192.168.0.15.60030 > 52.2.63.29.80: Flags [.], seq 4113773418:4113774866, ack 850072640, win 270, options [nop,nop,TS val 43002452 ecr 9849626], length 1448

然后使用正则表达式删除除IP地址和长度(总计)之外的所有内容,并将其打印为:

source: 66.185.85.146 dest: 192.168.0.15 total:1448
source: 66.185.85.146 dest: 192.168.0.15 total:1448
source: 192.168.0.15 dest: 66.185.85.146 total:0

但是如果有重复项,那么它将如下所示,它将添加重复项的总量:

source: 66.185.85.146 dest: 192.168.0.15 total:2896
source: 192.168.0.15 dest: 66.185.85.146 total:0

此外,如果我在终端中输入“-s”,如下:

"1.py -s a.txt"

"1.py a.txt -s 192.168.0.15"

它应该排序,对于第一个-s,它将排序和打印内容,如果-s ip,则排序ips。

目前这就是我对每件商品所拥有的,我想知道如何一起使用它们。

#!/usr/bin/python3
import re
import sys

file = sys.argv[1]
a = open(file, "r")

for line in a:
   line = line.rstrip()
   c = re.findall(r'^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$',line) #Yes I know its not the best regex for this, but I am testing it out for now
   d = re.findall(r'\b(\d+)$\b',line)

   if len(c) > 0 and len(d) > 0:
      print("source:", c[0],"\t","dest:",c[1],"\t", "total:",d[0])

这就是我到目前为止,我不知道如何使用“-s”或如何排序,以及如何删除重复项,并在删除重复项时添加总计。

3 个答案:

答案 0 :(得分:2)

要阅读-s,您可能希望库解析参数,例如标准argparse。它允许您指定脚本所需的参数及其描述,并解析它们并确保其格式。

要对列表进行排序,请使用sorted(my_list)功能。

最后,为了确保没有重复项,您可以使用set。这会丢失列表排序,但由于您将在以后对其进行排序,因此不应该成为问题。

或者,Counter集合专门用于添加分组值并对其进行排序。

from collections import Counter

results = Counter()

for line in a:
    line = line.rstrip()
    c = re.findall(r'^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$',line) #Yes I know its not the best regex for this, but I am testing it out for now
    d = re.findall(r'\b(\d+)$\b',line)

    if len(c) > 0 and len(d) > 0:
        source, destination, length = c[0], c[1], d[0]
        results[(source, destination)] += int(length)

# Print the sorted items.
for (source, destination), length in results.most_common():
    print("source:", source, "\t", "dest:", destination, "\t", "total:", length)

答案 1 :(得分:2)

您的ArgumentParser参数需要-s,例如:

import argparse
...
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('-s', '--sort', action='append',
                    help='sort specific IP')
    parser.add_argument('-s2', '--sortall', action='store_true',
                    help='sort all the IPs')

    args = parser.parse_args()
    if args.sortall:
        # store all Ips

    for ip in args.sort:
        # store by ip
if __name__ == '__main__':
    main()

现在您可以使用如下脚本:

1.py a.txt -s 192.168.0.15

1.py a.txt -s2

除此之外,关于如何将所有内容放在一起,看起来像是一个功课,所以你应该阅读更多关于python的信息来解决它。

答案 2 :(得分:1)

要添加

ArgumentParser - 顺便说一句,下面的代码可以正常输入文件路径 -

import re
from  collections import defaultdict 

with open(r"C:\ips.txt",'rb') as ip_file:
    txt = ip_file.read()
    ip=re.findall(r'[0-9.]+[\s]+[>][\s0-9.]+',txt)
    ip1 = ['>'.join(re.findall(r'[0-9.]+(?=[.])',i)) for i in ip]
    packs = re.findall(r'(?<=length )[0-9]+',txt)
    data = zip(ip1,packs)
    d = defaultdict(list)
    for k, v in data:
        d[k].append(v)
    for i,j in d.items():
        source,destination = i.split('>')[0],i.split('>')[1]
        print "source: {0} destination: {1} total: {2}".format(source,destination,sum(map(int,j)))

打印 -

source: 192.168.0.15 destination: 66.185.85.146 total: 0
source: 66.185.85.146 destination: 192.168.0.15 total: 23168
source: 192.168.0.15 destination: 52.2.63.29 total: 1448