我有一个如下的文本文件,
#relation 'train'
#attri 'x' real
#attri 'y' integer
#attri 'z' binary (0/1)
#attri 'a' real
#attri 'b' integer
#attri 'class' binary(good/bad)
#data
1.2, 5, 0, 2.3, 4, good
1.3, 6, 1, 1.8, 5, bad
1.6, 7, 0, 1.9, 6, good
2.1, 8, 1, 2.1, 8, good
我想了解如何使用python(不使用panda数据框,纯Pythonic版本。因为我知道如何使用Panda来执行此任务)将标题名称放入相应的列。这是我到目前为止所做的,
import re
columns = []
with open('test.txt', 'r') as f:
lines=f.readlines()
for line in lines:
l = line.strip()
if l.startswith('#attri'):
columns.append(line.split()[1].strip("'"))
if not l.startswith("#"):
print(l)
print(columns)
感谢您在不使用熊猫的情况下帮助我。 我希望输出如下所示,
x y z a b class
1.2, 5, 0, 2.3, 4, good
1.3, 6, 1, 1.8, 5, bad
1.6, 7, 0, 1.9, 6, good
2.1, 8, 1, 2.1, 8, good
答案 0 :(得分:1)
您可以尝试这种方法:将其他行也分成几部分(而不仅仅是标题),然后使用固定宽度的格式化程序打印所有内容(例如,我使用{:5s}
)。
columns_header = []
data_rows = []
with open('test.txt', 'r') as f:
for line in f:
line = line.strip()
if len(line) > 0:
if line.startswith('#attri'):
# split directly by ', not by space and then removing '
columns_header.append(line.split("'")[1])
if not line.startswith("#"):
# split into parts
data_rows.append(line.split(','))
# add at first position
data_rows.insert(0, columns_header)
for parts in data_rows:
print(
' '.join(
'{:5s}'.format(s.strip())
for s in parts))
此打印:
x y z a b class
1.2 5 0 2.3 4 good
1.3 6 1 1.8 5 bad
1.6 7 0 1.9 6 good
2.1 8 1 2.1 8 good
答案 1 :(得分:1)
怎么样
columns = []
data = []
with open('test.txt', 'r') as f:
lines = f.readlines()
for line in lines:
l = line.strip()
if l.startswith('#attri'):
columns.append(line.split()[1].strip("'"))
if not l.startswith("#"):
data.append(l)
print(' '.join(columns))
for entry in data:
print(entry)
答案 2 :(得分:1)
看起来很简洁:
with open('text.txt') as txt:
headers = [l.split()[1][1:-1] for l in txt if '#attri ' in l]
txt.seek(0)
data = [l for l in txt if not l.startswith('#')]
print('\t'.join(headers),'\n')
for l in data:
print(l.replace(' ', '\t'))
并给出:
x y z a b class
1.2, 5, 0, 2.3, 4, good
1.3, 6, 1, 1.8, 5, bad
1.6, 7, 0, 1.9, 6, good
2.1, 8, 1, 2.1, 8, good