Question

我有一个这种格式的文本文件（格式由---和||||分隔，使其看起来像一个表格）

the format is delimited by --- and |||| to make it look like a table

  st
---------------------------------------------------------------------------------------------------------
Server : hh site:          Date : 2012-03-10  Run Time :00.00.00
---------------------------------------------------------------------------------------------------------
AA       |dd                     |condition          |another                    |condition        |Ref.
yy       |sa33                   |true               |OK: 4tu                    |true             |yt.4.5
         |                       |                   |                           |                 |.3
---------|-----------------------|-------------------|---------------------------|-----------------|-----
BB       |tr  dd                 |2                  |dhfdk                      |                 |yt.5.1
         |verson                 |                   |    t3hd                   |    true         |.1
         |and above)             |                   |                           |                 |
---------|-----------------------|-------------------|---------------------------|-----------------|-----

细胞的内容都是有价值的。没有前进。感谢

我没有任何编程技巧来阅读文件并解析它。如何删除----和|||||并在Excel中导入为行和列。

Answer 1

作为使用Pandas的替代方法，您可以自己解析文件并使用Python xlsxwriter等Excel Excel库来创建.xlsx文件：

from itertools import islice    
import xlsxwriter

wb = xlsxwriter.Workbook("output.xlsx")
ws = wb.add_worksheet()
cell_format = wb.add_format()
cell_format.set_text_wrap()
cell_format.set_align('top')

with open('input.txt', 'rb') as f_input:
    csv_input = csv.reader(f_input, delimiter='|')
    cells = []
    row_output = 1

    header = [row.strip() for row in islice(f_input, 0, 4)][2]
    ws.merge_range('A1:G1', header)
    #ws.write(0, 0, header)

    for row_input in csv_input:
        if row_input[0].startswith('---'):
            for col, cell in enumerate(zip(*cells)):
                ws.write(row_output, col, '\n'.join(cell), cell_format)
            row_output += 1
            cells = []
        else:
            cells.append(row_input)

wb.close()

这将创建一个与您的数据格式相同的Excel文件，即每个单元格包含多行：

Answer 2

pandas库应该做所有需要的事情！

iPython环境中的

代码：

import pandas as pd
from cStringIO import StringIO

text_file = '''
  st
---------------------------------------------------------------------------------------------------------
Server : hh site:          Date : 2012-03-10  Run Time :00.00.00
---------------------------------------------------------------------------------------------------------
AA       |dd                     |condition          |another                    |condition        |Ref.
yy       |sa33                   |true               |OK: 4tu                    |true             |yt.4.5
         |                       |                   |                           |                 |.3
---------|-----------------------|-------------------|---------------------------|-----------------|-----
BB       |tr  dd                 |2                  |dhfdk                      |                 |yt.5.1
         |verson                 |                   |    t3hd                   |    true         |.1
         |and above)             |                   |                           |                 |
---------|-----------------------|-------------------|---------------------------|-----------------|-----
'''

# Read in tabular data, skipping the first header rows
# StringIO(text_file) is for example only
# Normally, you would use pd.read_csv('/path/to/file.csv', ...)
top = pd.read_table(StringIO(text_file), sep='\s{2,}', header=None, skiprows=3, nrows=1)
df = pd.read_table(StringIO(text_file), sep='|', header=None, skiprows=5)

# Remove '-' lines
df = df[~df[0].str.contains('-')]

# Reset the index
df = df.reset_index().drop('index', 1)

# Combine top line 
df = pd.concat([top, df], ignore_index=True)

df

做任何你需要做的事情来清理数据，然后写入excel：

# Write to excel file
df.to_excel('/path/to/file.xls')

将文本文件转换为Excel工作表

2 个答案: