我有一个这种格式的文本文件(格式由---
和||||
分隔,使其看起来像一个表格)
st
---------------------------------------------------------------------------------------------------------
Server : hh site: Date : 2012-03-10 Run Time :00.00.00
---------------------------------------------------------------------------------------------------------
AA |dd |condition |another |condition |Ref.
yy |sa33 |true |OK: 4tu |true |yt.4.5
| | | | |.3
---------|-----------------------|-------------------|---------------------------|-----------------|-----
BB |tr dd |2 |dhfdk | |yt.5.1
|verson | | t3hd | true |.1
|and above) | | | |
---------|-----------------------|-------------------|---------------------------|-----------------|-----
细胞的内容都是有价值的。没有前进。感谢
我没有任何编程技巧来阅读文件并解析它。如何删除----
和|||||
并在Excel中导入为行和列。
答案 0 :(得分:2)
作为使用Pandas
的替代方法,您可以自己解析文件并使用Python xlsxwriter
等Excel Excel库来创建.xlsx
文件:
from itertools import islice
import xlsxwriter
wb = xlsxwriter.Workbook("output.xlsx")
ws = wb.add_worksheet()
cell_format = wb.add_format()
cell_format.set_text_wrap()
cell_format.set_align('top')
with open('input.txt', 'rb') as f_input:
csv_input = csv.reader(f_input, delimiter='|')
cells = []
row_output = 1
header = [row.strip() for row in islice(f_input, 0, 4)][2]
ws.merge_range('A1:G1', header)
#ws.write(0, 0, header)
for row_input in csv_input:
if row_input[0].startswith('---'):
for col, cell in enumerate(zip(*cells)):
ws.write(row_output, col, '\n'.join(cell), cell_format)
row_output += 1
cells = []
else:
cells.append(row_input)
wb.close()
这将创建一个与您的数据格式相同的Excel文件,即每个单元格包含多行:
答案 1 :(得分:1)
pandas库应该做所有需要的事情!
iPython环境中的代码:
import pandas as pd
from cStringIO import StringIO
text_file = '''
st
---------------------------------------------------------------------------------------------------------
Server : hh site: Date : 2012-03-10 Run Time :00.00.00
---------------------------------------------------------------------------------------------------------
AA |dd |condition |another |condition |Ref.
yy |sa33 |true |OK: 4tu |true |yt.4.5
| | | | |.3
---------|-----------------------|-------------------|---------------------------|-----------------|-----
BB |tr dd |2 |dhfdk | |yt.5.1
|verson | | t3hd | true |.1
|and above) | | | |
---------|-----------------------|-------------------|---------------------------|-----------------|-----
'''
# Read in tabular data, skipping the first header rows
# StringIO(text_file) is for example only
# Normally, you would use pd.read_csv('/path/to/file.csv', ...)
top = pd.read_table(StringIO(text_file), sep='\s{2,}', header=None, skiprows=3, nrows=1)
df = pd.read_table(StringIO(text_file), sep='|', header=None, skiprows=5)
# Remove '-' lines
df = df[~df[0].str.contains('-')]
# Reset the index
df = df.reset_index().drop('index', 1)
# Combine top line
df = pd.concat([top, df], ignore_index=True)
df
做任何你需要做的事情来清理数据,然后写入excel:
# Write to excel file
df.to_excel('/path/to/file.xls')