Question

我有一个如下的文本文件，

#relation 'train'
#attri 'x' real
#attri 'y' integer
#attri 'z' binary (0/1)
#attri 'a' real
#attri 'b' integer
#attri 'class' binary(good/bad)
#data
1.2, 5, 0, 2.3, 4, good
1.3, 6, 1, 1.8, 5, bad
1.6, 7, 0, 1.9, 6, good
2.1, 8, 1, 2.1, 8, good

我想了解如何使用python（不使用panda数据框，纯Pythonic版本。因为我知道如何使用Panda来执行此任务）将标题名称放入相应的列。这是我到目前为止所做的，

import re

columns = []

with open('test.txt', 'r') as f:
    lines=f.readlines()
    for line in lines:
        l = line.strip()

        if l.startswith('#attri'):
            columns.append(line.split()[1].strip("'"))

        if not l.startswith("#"):
            print(l)
print(columns)

感谢您在不使用熊猫的情况下帮助我。我希望输出如下所示，

 x   y  z   a   b   class
1.2, 5, 0, 2.3, 4, good
1.3, 6, 1, 1.8, 5, bad
1.6, 7, 0, 1.9, 6, good
2.1, 8, 1, 2.1, 8, good

Answer 1

您可以尝试这种方法：将其他行也分成几部分（而不仅仅是标题），然后使用固定宽度的格式化程序打印所有内容（例如，我使用{:5s}）。

columns_header = []
data_rows = []

with open('test.txt', 'r') as f:
    for line in f:
        line = line.strip()
        if len(line) > 0:
            if line.startswith('#attri'):
                # split directly by ', not by space and then removing '
                columns_header.append(line.split("'")[1])

            if not line.startswith("#"):
                # split into parts
                data_rows.append(line.split(','))

# add at first position
data_rows.insert(0, columns_header)

for parts in data_rows:
    print(
        ' '.join(
            '{:5s}'.format(s.strip())
            for s in parts))

此打印：

x     y     z     a     b     class
1.2   5     0     2.3   4     good 
1.3   6     1     1.8   5     bad  
1.6   7     0     1.9   6     good 
2.1   8     1     2.1   8     good

Answer 2

怎么样

columns = []
data = []
with open('test.txt', 'r') as f:
    lines = f.readlines()
    for line in lines:
        l = line.strip()

        if l.startswith('#attri'):
            columns.append(line.split()[1].strip("'"))

        if not l.startswith("#"):
            data.append(l)
print('   '.join(columns))
for entry in data:
    print(entry)

Answer 3

看起来很简洁：

with open('text.txt') as txt:
    headers = [l.split()[1][1:-1] for l in txt if '#attri ' in l]
    txt.seek(0)
    data = [l for l in txt if not l.startswith('#')]
    print('\t'.join(headers),'\n')
    for l in data:
        print(l.replace(' ', '\t'))

并给出：

x       y   z   a       b   class 
1.2,    5,  0,  2.3,    4,  good
1.3,    6,  1,  1.8,    5,  bad
1.6,    7,  0,  1.9,    6,  good
2.1,    8,  1,  2.1,    8,  good

将相应列的标题放在python中（不使用熊猫）

3 个答案: