Question

我有一个非常讨厌的表格式，它包含文本和表格，我需要单独阅读它们。

数据看起来像

table 1 text text text

log(x) a b c d e
1 2 3 4 5
2 3 4 5 6
7 8 9 0 1

table 2 text text text

log(x) a b c d e
1 2 3 4 5
2 3 4 5 6
7 8 9 0 1

etc

所以它是表格标题，表格标题然后是表格。我希望jus表，所以部分：

1 2 3 4 5
2 3 4 5 6
7 8 9 0 1

关于如何一次性完成这项任务的想法？

Answer 1

如果这些表是唯一以数字开头的行，你可以使用内部循环来创建表，当我们到达每个表的末尾时就会中断：

from itertools import count


cn = count(1)
tables = {}
with open("in.txt") as f:
    for line in f:
        if line[0].isdigit():
            key = next(cn)
            tables[key] = [line.rstrip()]
            for line in f:
                if line[0].isdigit():
                    tables[key].append(line.rstrip())
                else:
                    break
print(tables)

{1: ['1 2 3 4 5', '2 3 4 5 6', '7 8 9 0 1'], 2: ['1 2 3 4 5', '2 3 4 5 6', '7 8 9 0 1']}

或使用itertools.groupby并创建列表列表：

from itertools import groupby

tables = {}
with open("in.txt") as f:
    for k,v in groupby(f,lambda x: x[0].isdigit()):
        if k:
            key = next(cn)
            tables[key] = [x.split() for x in v]

{1: [['1', '2', '3', '4', '5'], ['2', '3', '4', '5', '6'], ['7', '8', '9', '0', '1']], 2: [['1', '2', '3', '4', '5'], ['2', '3', '4', '5', '6'], ['7', '8', '9', '0', '1']]}

您想要输出的格式可以自行决定，如果您想要注册等。使用map(int,...

Answer 2

您是否尝试读取文件并处理以下示例中显示的信息？

file = open('testfile.txt','r')

text=file.read()
lines=text.split('\n')

tabs = []

for lineindex in range(len(lines)):
    if 'a b c d e' in lines[lineindex]:
        # this is just an idea...
        tabs.append(lines[lineindex+1 : lineindex+4])

file.close()

tabs

tabs的输出是嵌套列表。它包含列表，填充了文本文件中的表值。确保python找到文件的路径。

你也可以留意关键字＆＃39; table＆＃39;在循环中，然后形成一个字典，看起来像这样：

dictionary={
    'table 1': [data],
    'table 2': [data],
    'table 3': ...
    }

Answer 3

文件模式的一般解决方案（无论表中的值类型如何）是删除不需要的行（表名，标题，换行符），并保留其余部分。

f = open('file.txt', 'r')
count_tables = 0
skip_lines = 0
for line in f.readlines():
    # We skip over the two unnecessary lines just above the table's entries
    if skip_lines != 0:
        skip_lines -= 1
        continue

    # We add a fully parsed table to a list at the occurrence of the following newline
    if(line.strip() == ''):
        if (count_tables != 0):
            dict_table = {'table_name': table_name,
                          'values': rows}
            list_tables.append(dict_table)
            rows = []
        continue
    # Else, we parse our table
    tokens = line.strip().split()    
    if (tokens[0] == 'table'):
        table_name = int(tokens[1])
        skip_lines = 2
        count_tables += 1
        continue

    row_vals = line.strip().split()
    rows.append(row_vals)

f.close()

这将为您提供字典列表，每个字典都包含table_name条目和值列表。

Python，从txt文件中读取100多个表

3 个答案: