从python中的文件中读取特定元组

时间:2014-11-05 12:06:32

标签: python file

使用seek和tell功能不正常,因为tell以字节为单位返回当前位置;我需要获取行号而不是文件指针的位置才能继续。

我有一个文件glass.csv,我需要对数据集进行聚类。文件中的每一行都包含一个数字1,2,3...,如下所示:

65,1.52172,13.48,3.74,0.90,72.01,0.18,9.61,0.00,0.07,1
66,1.52099,13.69,3.59,1.12,71.96,0.09,9.40,0.00,0.00,1
67,1.52152,13.05,3.65,0.87,72.22,0.19,9.85,0.00,0.17,1
68,1.52152,13.05,3.65,0.87,72.32,0.19,9.85,0.00,0.17,1
69,1.52152,13.12,3.58,0.90,72.20,0.23,9.82,0.00,0.16,1
70,1.52300,13.31,3.58,0.82,71.99,0.12,10.17,0.00,0.03,1
71,1.51574,14.86,3.67,1.74,71.87,0.16,7.36,0.00,0.12,2
72,1.51848,13.64,3.87,1.27,71.96,0.54,8.32,0.00,0.32,2
73,1.51593,13.09,3.59,1.52,73.10,0.67,7.83,0.00,0.00,2
74,1.51631,13.34,3.57,1.57,72.87,0.61,7.89,0.00,0.00,2
142,1.51851,13.20,3.63,1.07,72.83,0.57,8.41,0.09,0.17,2
143,1.51662,12.85,3.51,1.44,73.01,0.68,8.23,0.06,0.25,2
144,1.51709,13.00,3.47,1.79,72.72,0.66,8.18,0.00,0.00,2
145,1.51660,12.99,3.18,1.23,72.97,0.58,8.81,0.00,0.24,2
146,1.51839,12.85,3.67,1.24,72.57,0.62,8.68,0.00,0.35,2
147,1.51769,13.65,3.66,1.11,72.77,0.11,8.60,0.00,0.00,3
148,1.51610,13.33,3.53,1.34,72.67,0.56,8.33,0.00,0.00,3
149,1.51670,13.24,3.57,1.38,72.70,0.56,8.44,0.00,0.10,3
150,1.51643,12.16,3.52,1.35,72.89,0.57,8.53,0.00,0.00,3

我需要从那些以1作为最后一个数字的元组中获取一些输入,并将其保存在另一个文件中(train.txt),其余文件保存在另一个文件中,(test.txt )。同样,我需要从2作为最后一个数字的那些行中获取某些行,并附加到第一个文件,即train.txt,并保留到test.txt

我无法获得第二个输入,但会附加第一个结果。

2 个答案:

答案 0 :(得分:0)

读取文本文件的默认行为是逐行的。你可以这样做:

with open('input.csv', 'r') as f, open('output_1.csv') as output_1, open('output_2.csv') as output_2:
    for line in f:
        line_fields = line.strip().split()[',']
        if line_fields[-1] == '1':
            output_1.write(line)
            continue
        if line_fields[-1] == '2':
            output_2.write(line)

或者你可以使用CSV模块,它更容易https://docs.python.org/2/library/csv.html

答案 1 :(得分:0)

最简单的方法,假设您有一个大文件并且无法简单地加载整个文件,那么每个文件将使用1个文件进行排序。如果它是一个小(ish)输入文件,那么只需使用csv模块加载为逗号分隔文件。

作为一种快速而肮脏的方法,(假设文件很小)。

data = []
with open('glass.csv', 'r') as infile:
   for line in infile:
      linedata = [float(val) for val in line.strip().split(',')]
      data.append(linedata)

adata = sorted(data, key=lambda items: items[-1])
## Then open both your output files and write them in the required fields.