将文本分组并输出到csv

时间:2015-02-27 19:51:51

标签: python regex csv

我想从以下示例数据中找到设备地址,USB端口和错误发生的次数:

这是我到目前为止所拥有的:

import re
import csv

f = open("file.txt", "r")

searchlines = f.readlines()                   
f.close()

for element in searchlines:

     usbPres = re.search('(USB)',element) #pattern to find usb lines
     devAddr = re.search('(address)\s\d+',element) #parsing pattern for device address
     port = re.search('(port)\s\d',element) #parsing pattern for port
     if usbPres:

这是我迷路的地方,因为我想将正确的端口分配给设备地址,然后计算在将新设备插入该端口之前失败的时间,然后将其写入CSV文件。

预期输出
DevicAddr Port Number of failed attempts 42 3 5
47 7 2
52 7 1

" 示例数据"

enter code here

[11883.112089] hub 1-0:1.0: Cannot enable port 3. Maybe the USB cable is bad?
[11883.224080] usb 1-7: new high speed USB device using ehci_hcd and address 42
[11883.328151] hub 1-0:1.0: unable to enumerate USB device on port 3
[11904.472097] hub 1-0:1.0: Cannot enable port 3. Maybe the USB cable is bad?
[11907.440096] hub 1-0:1.0: Cannot enable port 3. Maybe the USB cable is bad?
[11910.408093] hub 1-0:1.0: Cannot enable port 3. Maybe the USB cable is bad?
[11913.376095] hub 1-0:1.0: Cannot enable port 3. Maybe the USB cable is bad?
[11913.616090] usb 1-7: new high speed USB device using ehci_hcd and address 47

[11913.716121] hub 1-0:1.0: unable to enumerate USB device on port 7

[11927.340096] hub 1-0:1.0: Cannot enable port 3. Maybe the USB cable is bad?
[11930.308096] hub 1-0:1.0: Cannot enable port 7. Maybe the USB cable is bad?
[11933.276124] hub 1-0:1.0: Cannot enable port 7. Maybe the USB cable is bad?
[11934.224080] usb 1-7: new high speed USB device using ehci_hcd and address 52
[11936.244118] hub 1-0:1.0: unable to enumerate USB device on port 7 is bad?
[11939.212116] hub 1-0:1.0: Cannot enable port 7. Maybe the USB cable is bad?

1 个答案:

答案 0 :(得分:0)

如果我正在正确地读取您的示例输入和输出,则实际上应该有6次尝试设备地址42端口3失败。看起来我们可以得到延迟打印失败的尝试,即使它已经开始在下一个地址上端口。

无论哪种方式,下面的代码工作并捕获了6次尝试,并忽略了前几​​个没有开始的尝试。我正在使用字典从地址和端口的唯一组合映射到失败尝试的次数。如果这样的配对不是唯一的,但您想要区分同一对地址和端口的尝试,则需要在遇到每对时添加唯一标识符。如果您对答案有任何疑问,请告诉我。

import os
import argparse
import operator
import re

def main():
    p = argparse.ArgumentParser (description="Creates a csv file report on data file.")
    p.add_argument("datafile", help="Path of data")
    p.add_argument("outfile", help="Path of new file.")
    args = p.parse_args()

    with open(args.datafile) as f:
        lines = f.read().splitlines()

    port_to_addr = dict()
    port_addr_to_count = dict()
    current_port = 'default'
    current_addr = 'default'
    for line in lines:
        usbPres = re.search("USB",line)
        if usbPres:
            devAddr = re.search("address\s(\d+)", line)
            beginEnumerate = re.search("enumerate", line)
            port = re.search("port\s(\d+)", line)

            if devAddr:
                current_addr = devAddr.group(1)
            elif beginEnumerate and port:
                current_port = port.group(1)
                port_to_addr[current_port] = current_addr
                port_addr_to_count[(current_port, port_to_addr[current_port])] = 1
            elif port:
                current_port = port.group(1)
                if current_port in port_to_addr and (current_port, port_to_addr[current_port]) in port_addr_to_count:
                    port_addr_to_count[(current_port, port_to_addr[current_port])] += 1

    header_row = ["DevicAddr", "Port", "Number of failed attempts"]
    rows = []
    for (port, addr) in port_addr_to_count:
        rows.append([addr, port, str(port_addr_to_count[(port, addr)])])

    rows.sort()
    rows.insert(0,header_row)

    newfile_lines = [",".join(row) + "\n" for row in rows]

    with open(args.outfile, 'w') as f:
        f.writelines(newfile_lines)


if __name__ == '__main__':
    main()

运行样本数据后的输出是:

DevicAddr,Port,Number of failed attempts
42,3,6
47,7,3
52,7,2