Python脚本将数据从Text提取到Excel / CSV

时间:2018-02-20 16:56:27

标签: python excel csv

我试图在Python中编写一个脚本,将文本文件中的数据提取到 CSV

数据如下所示:

*------------*
102GCPC-XP
not online
*------------*
------------
105PEACHPC


name                : 105PEACHPC
manufacturer        : Dell Inc.
model               : OptiPlex 755                 
totalphysicalmemory : 2101907456
domain              : abc.com

serialnumber : 90QZGG

version : 5.1.2600

Processor: Intel(R) Pentium(R)

size : 79999073280

ipaddress : 255.255.0.0

------------

我希望数据如下所示:

COMPUTER NAME | STATUS   | NAME     | MANUFACTURER | MODEL      | TOTALPHYSICALMEMORY | DOMAIN | SERIALNUMBER | VERSION | PROCESSOR         | SIZE      | IPADDRESS |
--------------+----------+----------+--------------+------------+---------------------+--------+--------------+---------+-------------------+-----------+-----------+
102GCPC-XP    |not online|          |              |            |                     |        |              |         |                   |           |           |
--------------+----------+----------+--------------+------------+---------------------+--------+--------------+---------+-------------------+-----------+-----------+
105PEACHPC    |Online    |105PEACHPC|Dell Inc.     |OptiPlex 755|2101907456           |abc.com |90QZGG        |5.1.2600 |Intel(R) Pentium(R)|79999073280|255.255.0.0|

先谢谢。

1 个答案:

答案 0 :(得分:0)

您的广告素块似乎以----------开头和结尾,另外*显示该条目是否在线。代码首先需要通过搜索此分隔符然后逐行构造条目来将文本文件拆分为块。找到结束分隔符后,它会使用正则表达式查找所有匹配的fieldnames。最后,csv.DictWriter()用于将条目写入格式正确的CSV文件,该文件可以加载到Excel中:

import csv
import re

fieldnames = ['NAME', 'MANUFACTURER', 'MODEL', 'TOTALPHYSICALMEMORY', 
    'DOMAIN', 'SERIALNUMBER', 'VERSION', 'PROCESSOR', 'SIZE', 'IPADDRESS']

re_fields = re.compile(r'({})\s+:\s(.*)'.format('|'.join(fieldnames)), re.I)

with open('input.txt') as f_input, open('output.csv', 'wb') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=['COMPUTER NAME', 'STATUS'] + fieldnames)
    csv_output.writeheader()
    start = False

    for line in f_input:
        line = line.strip()

        if len(line): 
            if '------------' in line:
                if start:   
                    start = False
                    block.append(line)
                    text_block = '\n'.join(block)

                    for field, value in re_fields.findall(text_block):
                        entry[field.upper()] = value

                    if line[0] == '*':
                        entry['COMPUTER NAME'] = block[1]
                        entry['STATUS'] = block[2]
                    else:
                        entry['COMPUTER NAME'] = entry['NAME']
                        entry['STATUS'] = 'Online'

                    csv_output.writerow(entry)

                else:
                    start = True
                    entry = {}
                    block = [line]
            elif start:
                block.append(line)

因此,对于您提供的数据,您将获得output.csv包含:

COMPUTER NAME,STATUS,NAME,MANUFACTURER,MODEL,TOTALPHYSICALMEMORY,DOMAIN,SERIALNUMBER,VERSION,PROCESSOR,SIZE,IPADDRESS
102GCPC-XP,not online,,,,,,,,,,
105PEACHPC,Online,105PEACHPC,Dell Inc.,OptiPlex 755,2101907456,abc.com,90QZGG,5.1.2600,,79999073280,255.255.0.0

对于Python 3.x使用,请将输出代码修改为:

open('output.csv', 'w', newline='') as f_output: