创建字典,从列表中分配的键,从文本块中提取的值

时间:2018-07-11 04:38:28

标签: python-3.x

在过去的一周中,我已经将这些代码拼凑在一起了-我学到了很多东西,但是很多这些概念仍然让我难以理解。我将不胜感激,即使它只是指向正确的外观方向,也能提供任何帮助。预先感谢您阅读...

我收到一封电子邮件,其中包含需要每天发送的滥用信息。每条滥用情况报告的相关信息均以破折号分隔:

” ---------------------------------------------- ------------------------“

C9 /攻击者IP:9.48.9 [。] 969

C9域:

滥用地址:abuse@9.com |

注意:第一

C9端口:93 |

C9在我们的系统上处于活动状态:9天

恶意软件家族:gafgyt

最近94小时的攻击次数:9

攻击样本:

Attack Time: 9098-06-99T99:99:00.000Z

    - Victim: 9.59.998[.]999

    - Attack Type: udp

    - Attack Duration: 300 seconds

    - Attack Port: 8080

” ---------------------------------------------- ------------------------“

C9 /攻击者IP:9.9.69 [。] 39

C9域:

滥用地址:abuse@9.it |

注意:第一

C9端口:666 |

C9在我们的系统上处于活动状态:9天

恶意软件家族:mirai

最近94小时的攻击次数:9

攻击样本:

Attack Time: 9098-06-99T99:99:08.000Z

    - Victim: 9.9.948[.]98

    - Attack Type: udp_plain

    - Attack Duration: 300 seconds

    - Attack Port: None

” ---------------------------------------------- ------------------------“

可以发送任意数量的滥用情况报告,但没有固定数量。

计划: 1.注意破折号行数: 2.将破折号之间的线复制到列表中 3.创建字典,键->虚线索引,值->虚线之间的行列表

from itertools import islice
from collections import defaultdict
#dashes separate each abuse report in file test2.txt
dashes = "----------------------------------------------------------------------"
#create empty list dash_lines of line #s having dashes
dash_lines = []
#index of which "dash line" to start copying data
index = 0
#index2 tells where to stop (in this case, the next line of dashes found in the document)
index2 = 1
#holds the c2 abuse info, ideally i would want multiple buckets for each abuse report / email to be sent or to have an abuse email sent and corresponding bucket emptied before adding to it again
c2_list = []
#c2_bucket = {}
c2_bucket = defaultdict(list)
  1. 注释行的破折号:

这将创建一个包含test2.txt中的破折号的行号列表

def find_dash_lines():                                                                      
    with open ("test2.txt", 'rt') as c2_email_body:                 #opens (then closes whend done) test2.txt containing abuse info
        for num, line in enumerate(c2_email_body):                  #line count in test2.txt
            if dashes in line:                                      #if dashes are found:
                dash_lines.append(num)                              #append line number to dash_lines list
#               print("Found A dash at line:", num)                 #error testing
        if not dash_lines:                                          #if there's not dash lines in list give error
            print("Could not determine dash line locations")        #to let me know there's an error
        else:
            dash_lines.append(1000)                                 #append line #1000 to end of list to ensure data is captured since there is no trailing dash line for last abuse report
            for num in dash_lines:
                print("Found A dash at line:", num)                 #just to ensure something is happening will eventually comment out
find_dash_lines()
  1. 在破折号之间提取滥用报告。 此代码段能够提取或打印test.txt中破折号之间的第一个滥用报告

    def c2_list_function():
        以c2_email_body的形式打开(“ test2.txt”,“ rt”):#opens(然后在完成后关闭)test2.txt包含滥用信息
            索引= 0
            index2 = 1
            for islice(c2_email_body,dash_lines [index],dash_lines [index2])中的行:#从找到的第一行虚线到下一条
                line = line.replace(“]”,“”)#cleanup
                line = line.replace(“ [”,“”)
                line = line.replace(“ |”,“;”)
                c2_list.append(line)
                index + = 1#增加每个索引(用于确定切片位置)
                index2 + = 1#增加每个索引(用于确定切片位置)

  2. 创建字典。 键应该是dash_lines中的行号/索引 val应该是c2_list中的行列表,虚线之间的每个文本块

    def c2_dict_function():
        对于范围内的x(len(dash_lines)):
            c2_list_function()
            用于c2_list中的行:
                打印(行,结尾=“”)
            c2_bucket [x] .append(c2_list)
            c2_list.clear()

词典将填充键,但值将为空。示例输出:

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
dict_values([[[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]], [[]]])

c2_list似乎也没有正确填充,看起来它只能获取破折号之间的第一组信息-我假设我没有正确执行索引进度。

0 个答案:

没有答案