Python csv切断列

时间:2017-01-03 19:54:45

标签: python csv

我遇到了这个奇怪的问题。

我也应该提到这个过去有用,所以我也想到.csv或特定行本身可能有问题。

快速分解。我有一个脚本从CVE(漏洞)数据的.csv文件中提取数据。然后,它使用cvss模块来重新调整我们使用输出作为测量修补和紧急程度优先级的方法的结果。

(在我们实施新工具之前,此脚本是临时修复)

这是弄乱的地方。这是我的摄取文件输出现在的样子。

Vulnerability Title,Plugin ID,Original CVSS Score,Default Vector,Original Severity,AWS Score,AWS Vector,AWS Severity,Hosts,Host Type,Percentage Impacted
Cisco IOS IKEv1 Packet Handling Remote Information Disclosure (cisco-sa-20160916-ikev1) (BENIGNCERTAIN),NES-93736,4.6,CVSS2#AV:N/AC:L/Au:N/C:P/I:N/A:N,,,AV:N/AC:L/Au:N/C:P/I:N/A:N,,26,26,
Cisco IOS Software TCP Memory Leak DoS (cisco-sa-20150325-tcpleak),NES-82568,4.9,CVSS2#AV:N/AC:L/Au:N/C:N/I:N/A:C,,,AV:N/AC:L/Au:N/C:N/I:N/A:C,,30,26,
RHEL 5 / 6 / 7 : nss and nss-util (RHSA-2016:2779),NES-94912,4.2,CVSS2#AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,,,AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,,5112,23,

这是我脚本之后的输出(附后附上)

Vulnerability Title,Plugin ID,Original CVSS Score,Default Vector,Original Severity,AWS Score,AWS Vector,AWS Severity,Hosts,Host Type,Percentage Impacted
ium,4.6,AV:A/AC:H/Au:M/C:P/I:N/A:P/CDP:L/TD:H/CR:H/IR:H/AR:H,Medium,26,26,0.2524271844660194
Cisco IOS Software TCP Memory Leak DoS (cisco-sa-20150325-tcpleak),NES-82568,4.9,CVSS2#AV:N/AC:L/Au:N/C:N/I:N/A:C,Medium,4.9,AV:A/AC:H/Au:M/C:N/I:N/A:C/CDP:L/TD:M/CR:H/IR:H/AR:H,Medium,30,26,0.2912621359223301
RHEL 5 / 6 / 7 : nss and nss-util (RHSA-2016:2779),NES-94912,4.2,CVSS2#AV:N/AC:M/Au:N/C:C/I:C/A:C/E:F/RL:OF/RC:ND,Medium,4.2,AV:A/AC:H/Au:M/C:C/I:C/A:C/E:F/RL:OF/RC:ND/CDP:L/TD:M/CR:H/IR:H/AR:H,Medium,5112,23,0.615458704550927

为了进一步解释,第1行以' ium'这是一个截止的单词Medium,来自我的脚本第128行的底部(显示#ORIGINAL SCORE的部分)。它应该说是中等。所以基本上,如果你看看我的输入中的两个,并与输出进行比较,它会删除整行,并且只添加脚本试图添加的单词的一半。我想也许是因为所有的破解者或其他东西,但我不确定。

Cisco IOS IKEv1 Packet Handling Remote Information Disclosure (cisco-sa-20160916-ikev1) (BENIGNCERTAIN),NES-93736,4.6,CVSS2#AV:N/AC:L/Au:N/C:P/I:N/A:N,

以下是执行此功能的脚本。我知道它有点难看,欢迎改进建议,但现在找出为什么弄乱我的文件是我的首要任务。我已经考虑过切换到熊猫,但这需要一点时间,因为我从来没有使用它,所以不知道如何做到这一点。

def rescore_function():
#headers
    print 'Starting Rescore'
    csv_in = open('/tmp/rescore_test.csv', 'rb')
    csv_out = open('/tmp/rescored_vulnerabilities.csv', 'wb')
    writer = csv.writer(csv_out)
    reader = csv.reader(csv_in)
    headers = next(reader, None)
    if headers:
        writer.writerow(headers)

    print 'Creating Target Distrobution'
    for row in csv.reader(csv_in):
    #This is a terrible way of setting up the percentage of hosts impacted for target distrobution. Its ugly and horrible. Host count defines the host impacted, host_type identifies what kind of host it is. Such as Alinux, Rhel5, or Cisco IOS
        host_count = float(row[8])
        host_type = float(row[9])
        alinux_impact = host_count / ALINUX_HOST
        cisco_impact = host_count / CISCO_COUNT
        juniper_impact = host_count / JUNIPER_COUNT
        citrix_impact = host_count / CITRIX_COUNT        
        all_linux= host_count / LINUX_TOTAL
        print 'math set'

#The reason for vul_id is 3 lists combined is simple. alinux_impact NEEDS to be 24, cisco NEEDs to be 26, juniper NEEDS to match 27, because vul_id is the softwares 'vulnerability ID type
#range falls into all_linux. So fillvalue=vul_os[-1]  means if its not 24,26,27, it is "all_linux" which means it compares it to the All linux number.       
        vul_id = [24, 26, 27, 25] + range(24) + range(28,101)
        vul_os = [alinux_impact, cisco_impact, juniper_impact, all_linux]

        append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab')
        append_write = csv.writer(append_file)

#Does the for loop with the fillvalue as mentioned above. Basically Y is the host type (linux, Cisco IOS, etc) and X is the vulnerability type. So it runs through and figures out the TD and rescore methods.
#X equals the percetange of impacted, so the Metric will be based on amount/percentage of X impacted and does a regex search and replace based on that using the CVSS calculations.
        print vul_id
        print vul_os
        for x,y in izip_longest(vul_os, vul_id, fillvalue=vul_os[-1]):
            print x,y
            print host_type
     #VECTOR REGEXP, host_type is which OS/Device type. 23 = RHEL5, 24 = Alinux, 26 = Cisco, 27 = Juniper   
            if host_type == y:
                row[10] = x
                if  x <= 0.25:
                    AC_Metric = 'A:C/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    AP_Metric = 'A:P/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    AN_Metric = 'A:N/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    RCUC_Metric = 'RC:UC/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    RCUR_Metric = 'RC:UR/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    RCC_Metric = 'RC:C/CDP:L/TD:L/CR:H/IR:H/AR:H'
                    RCND_Metric = 'RC:ND/CDP:L/TD:L/CR:H/IR:H/AR:H'
                elif 0.26 <= x <= 0.75:
                    AC_Metric = 'A:C/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    AP_Metric = 'A:P/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    AN_Metric = 'A:N/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    RCUC_Metric = 'RC:UC/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    RCUR_Metric = 'RC:UR/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    RCC_Metric = 'RC:C/CDP:L/TD:M/CR:H/IR:H/AR:H'
                    RCND_Metric = 'RC:ND/CDP:L/TD:M/CR:H/IR:H/AR:H'
                else:
                    AC_Metric = 'A:C/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    AP_Metric = 'A:P/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    AN_Metric = 'A:N/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    RCUC_Metric = 'RC:UC/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    RCUR_Metric = 'RC:UR/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    RCC_Metric = 'RC:C/CDP:L/TD:H/CR:H/IR:H/AR:H'
                    RCND_Metric = 'RC:ND/CDP:L/TD:H/CR:H/IR:H/AR:H'


                text = row[6]
                text = re.sub(r'AV:N','AV:A',text)
                text = re.sub(r'AC:L','AC:H',text)
                text = re.sub(r'AC:M','AC:H',text)
                text = re.sub(r'Au:N','Au:M',text)
                text = re.sub(r'Au:S','Au:M',text)
                text = re.sub(r'A:C$',AC_Metric,text)
                text = re.sub(r'A:P$',AP_Metric,text)
                text = re.sub(r'A:N$',AP_Metric,text)
                text = re.sub(r'RC:UC',RCUC_Metric,text)
                text = re.sub(r'RC:UR',RCUR_Metric,text)
                text = re.sub(r'RC:C',RCC_Metric,text)
                text = re.sub(r'RC:ND',RCND_Metric,text)
                row[6] = text
    #NEW SCORE, uses CVSS module to take the previous vector and find out the the numbered score. It then uses that number to define the severity word.
                try:
                    vector = row[6]
                    c = CVSS2(vector)
                    row[5] = c.scores()[2]
                    vul_score = row[5]
                    if 0 <= vul_score <= 3.9:
                        vuln_word = 'Low'
                    elif 4.0 <= vul_score <=6.9:
                        vuln_word = 'Medium'
                    elif 7.0 <= vul_score <= 9.9:
                        vuln_word = 'High'
                    else:
                        vuln_word = 'Critical'
                    row[7] = vuln_word
                except CVSS2MalformedError:
                    rescored_success = False
                    pass
    #ORIGINAL SCORE, does the same as above for the original vector since NESSUS does not provide the Severity "word". This only finds the word, not the number value.
                default_score = float(row[2])
                if 0 <= default_score <= 3.9:
                    default_severity = 'Low'
                elif 4.0 <= default_score <=6.9:
                    default_severity = 'Medium'
                elif 7.0 <= default_score <= 9.9:
                    default_severity = 'High'
                else:
                    default_severity = 'Critical'
                row[4] = default_severity
                append_write.writerow(row)

1 个答案:

答案 0 :(得分:2)

您的代码非常难以重现,但我怀疑使用写入文件句柄和写入模式下所有缓冲进行/并发缓冲文件访问时会出现问题。相当混乱

  1. 首先使用csv_out = open('/tmp/rescored_vulnerabilities.csv', 'wb')
  2. 打开/截断
  3. 你写了标题
  4. 对于每次迭代,而上述句柄未关闭,则以追加模式打开文件: append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab')
  5. 你也不会关闭append_file
  6. 我建议:

    • 首先截断打开是好的
    • 删除append_file = open('/tmp/rescored_vulnerabilities.csv', 'ab')
    • append_write替换为write(它会起作用,write点在同一个文件上并且仍处于打开状态)
    • 最后不要忘记close csv_out(或将所有代码放在with open(...) as csv_out:块中

    请注意,此问题仅为Un * x。在Windows文件系统上,它会立即抛出异常,因为在写入模式下文件无法打开两次(有时也是如此)。