假设文件中的数据格式如下:
1 4 5
2 3 4
4 7 1
1 1 1
2 1 2
3 3 3
4 1 4
2 2 2
我总是希望在空行之间读取部分数据,例如我想要第一个空行和第二个空行之间的列,所以v1 = [1,2,3,4],v2 = [1,1,3,1],依此类推。 我做的第一件事我找到空行出现的索引:
filetmp = open('data.txt')
indices = []
for i, line in enumerate(filetmp):
tmp = ''.join(c for c in line if c.isalnum() or c.isspace())
print tmp
if not tmp.strip(): indices.append(i)
现在indices
确实包含正确的索引,即空行。下一部分是给出空行索引的想要部分,以便我们可以填充v1,v2等。我应该先做一个filetmp.readlines()
吗?或者在处理数据列时是否有更直接的方法来阅读特定部分?
答案 0 :(得分:1)
我这样做如下:
with open('data.txt') as f:
data = f.read()
v = []
# Split the string into blocks, by looking for duplicated line terminaters ('\n\n').
for i, block in enumerate(data.split('\n\n')):
# Split the blocks in lines by looking for line terminaters ('\n').
lines = block.split('\n')
v.append([])
for line in lines:
if line == "":
continue
v[i] += [line.split(' ')]
# Take the middle block and transpose it.
v1 = map(list, zip(*v[1]))
当然,你只能使用第二个块而不是遍历所有块。
作为一项功能:
def get_block_from_file(file_path, block_number):
with open(file_path) as f:
data = f.read()
blocks = data.split('\n\n')
try:
block = blocks[block_number - 1]
except IndexError:
print('Not enough blocks')
import sys; sys.exit(1)
v = []
lines = block.split('\n')
for line in lines:
if line == "":
continue
v += [map(int, line.split(' '))]
return map(list, zip(*v))
print(get_block_from_file('data.txt', 2))
答案 1 :(得分:0)
请试试这个。这是使用熊猫。这适合您当前的数据集。如果你有多个空白行(超过2个),那么你可能需要循环才能找到l_lower_index和l_upper_index
import pandas as pd
import numpy as np
l_df = pd.read_table('expt2data.txt',sep=' ',header=None,names=('Col_1','Col_2','Col_3'), skip_blank_lines=False)
l_lower_index = l_df[l_df['Col_1'].isnull()].index.values[0]
l_upper_index = l_df[l_df['Col_1'].isnull()].index.values[1]
v1 = l_df.ix[l_lower_index + 1:l_upper_index - 1]['Col_1'].values
v2 = l_df.ix[l_lower_index + 1:l_upper_index - 1]['Col_2'].values
print v1
print v2
<强>输出强>
[ 1. 2. 3. 4.]
[ 1. 1. 3. 1.]
<强> expt2data.txt 强>
1 4 5
2 3 4
4 7 1
1 1 1
2 1 2
3 3 3
4 1 4
2 2 2