Python脚本将数据从txt提取到csv

时间:2020-10-19 04:03:49

标签: python excel

我正在尝试编写Python脚本以将txt文件中的Wi-Fi数据提取到csv

这是txt数据:

Wed Oct  7 09:00:01 UTC 2020

BSS 02:ca:fe:ca:ca:40(on ap0_1)
freq: 2422
capability: IBSS (0x0012)
signal: -60.00 dBm
primary channel: 3
last seen: 30 ms ago
BSS ac:86:74:0a:73:a8(on ap0_1)
TSF: 229102338752 usec (2d, 15:38:22)
freq: 2422
capability: ESS (0x0421)
signal: -62.00 dBm
primary channel: 3

我需要将txt数据以以下格式提取到csv文件中:

 Time                        | BSS                       | freq |capability   |signal| primary channel |                                                
 ----------------------------+---------------------------+------+-------------+------+-----------------+                  
 Wed Oct  7 09:00:01 UTC 2020|02:ca:fe:ca:ca:40(on ap0_1)| 2422 |IBSS (0x0012)|-60.00|             3   |
                             |ac:86:74:0a:73:a8(on ap0_1)| 2422 |IBSS (0x0012)|-62.00|             3   |

这是我未完成的代码:

import csv
import re

fieldnames = ['TIME', 'BSS', 'FREQ','CAPABILITY', 'SIGNAL', 'CHANNEL']

re_fields = re.compile(r'({})+:\s(.*)'.format('|'.join(fieldnames)), re.I)

with open('ap0_1.txt') as f_input, open('ap0_1.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
    csv_output.writeheader()
    start = False

    for line in f_input:
        line = line.strip()

        if len(line):
            if 'BSS' in line:
                if start:
                    start = False
                    block.append(line)
                    text_block = '\n'.join(block)

                    for field, value in re_fields.findall(text_block):
                        entry[field.upper()] = value

                    if line[0] == 'on ap0_1':
                        entry['BSS'] = block[0]

                    csv_output.writerow(entry)

                else:
                    start = True
                    entry = {}
                    block = [line]
            elif start:
                block.append(line)

运行时,数据放置不正确。

enter image description here

请让我知道如何解决此问题。我只是编程的初学者,将不胜感激。谢谢。

3 个答案:

答案 0 :(得分:1)

使用str.startswith

例如:

import csv

fieldnames = ('TIME', 'BSS', 'freq','capability', 'signal', 'primary channel')
with open(filename) as f_input, open(outfile,'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
    csv_output.writeheader()
    result = {"TIME": next(f_input).strip()}   #Get Time, First Line
    for line in f_input:
        line = line.strip()
        if line.startswith(fieldnames):
            if line.startswith('BSS'):
                key, value = line.split(" ", 1)
            else:
                key, value = line.split(": ")
            result[key] = value
            
    csv_output.writerow(result)

根据评论进行编辑

如果上面的文本有多个块

import re
import csv

week_ptrn = re.compile(r"\b(" + "|".join(('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun')) + r")\b")
fieldnames = ('TIME', 'BSS', 'freq','capability', 'signal', 'primary channel')

with open(filename) as f_input, open(outfile,'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
    csv_output.writeheader()
    result = []    #Get Time, First Line
    for line in f_input:
        line = line.strip()
        week = week_ptrn.match(line)
        if week:
            result.append({"TIME": line})
            
        if line.startswith(fieldnames):
            if line.startswith('BSS'):
                key, value = line.split(" ", 1)
            else:
                key, value = line.split(": ")
            result[-1][key] = value
            
    csv_output.writerows(result)

答案 1 :(得分:0)

这是我的代码版本。

import csv, re

fieldnames = ['TIME', 'BSS', 'FREQ','CAPABILITY', 'SIGNAL', 'CHANNEL']
re_fields = re.compile(r'({})+:\s(.*)'.format('|'.join(fieldnames)), re.I)

with open('ap0_1.txt') as f_input, open('ap0_1.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
    csv_output.writeheader()
    start = False
 
    time_condition = lambda @l: l.startswith('Mon') or l.startswith('Tue') or \ 
                     l.startswith('Wed') or l.startswith('Thu') or l.startswith('Fri') \ 
                     or l.startswith('Sat') or l.startswith('Sun')
    
    row = dict{}
    for line in f_input:
        line = line.strip()
        if not line:
            continue
        elif time_condition(line):
            row['TIME'] = line
        else:
            # not sure how you define the start of a new block, say, it is by 'BSS' string
            key, value = line.split(' ', 1) # split one time exactly
            key = key.rstrip(':').upper()
            if key == 'BSS' and row:
                row = (row.get(k, '') for k in fieldnames)
                csv_output.writerow(row)
                row = dict()
  
            row[key.upper()] = value
    row = (row.get(k, '') for k in fieldnames)
    csv_output.writerow(row)   

看起来'\ n'创建了空白行。

答案 2 :(得分:0)

您尝试使用“ TIME”搜索时间。但是输入数据中没有“ TIME”。 因此,空时间输出是很自然的。

我认为遵循规则也有问题。

            if line[0] == 'on ap0_1':
                entry['BSS'] = block[0]

据我所知,您试图在on ap0_1中找到BSS ac:86:74:0a:73:a8(on ap0_1)。 但是第[0]行是[BSS],['BSS','ac:86:74:0a:73:a8(on','ap0_1)']的第一个。它应该像这样更改:

            if 'on ap0_1' in block[0]:
                entry['BSS'] = block[0][4:].lstrip()