在python2.7.11中,为什么我不能删除fileopen代码?

时间:2016-03-08 03:32:01

标签: python numpy

保存数据的.txt文件如下(来源:" datingTestSet2.txt"在第2章here中):

40920   8.326976    0.953952    largeDoses
14488   7.153469    1.673904    smallDoses
26052   1.441871    0.805124    didntLike
75136   13.147394   0.428964    didntLike
38344   1.669788    0.134296    didntLike
...   

代码:

from numpy import *
import operator
from os import listdir

def file2matrix(filename):
    fr = open(filename)
    # arr = fr.readlines() # Code1!!!!!!!!!!!!!!!!!!!
    numberOfLines = len(fr.readlines())        #get the number of lines in the file
    returnMat = zeros((numberOfLines,3))       #prepare matrix to return   
    classLabelVector = []                      #prepare labels return   
    fr = open(filename)  # Code2!!!!!!!!!!!!!!!!!!!!!
    index = 0
    for line in fr.readlines():
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')

此功能的结果是:

      datingDataMat                 datingLabels
40920   8.326976    0.953952           3
14488   7.153469    1.673904           2
26052   1.441871    0.805124           1
75136   13.147394   0.428964           1
38344   1.669788    0.134296           1
72993   10.141740   1.032955           1
35948   6.830792    1.213192           3
42666   13.276369   0.543880           3
67497   8.631577    0.749278           1
35483   12.273169   1.508053           3
50242   3.723498    0.831917           1
...     ...         ...               ...

我的问题是:

  1. 当我删除Code {fr = open(filename)以上的index = 0时, 函数的结果变为全零点矩阵,并且全部为零矢量。 为什么我不能删除Code2?不是第一行(fr = open(filename)工作吗?

  2. 当我刚添加Code1(arr = fr.readlines())时,这是错误的。为什么???

    returnMat[index,:] = listFromLine[0:3]
    
    IndexError: index 0 is out of bounds for axis 0 with size 0
    

3 个答案:

答案 0 :(得分:2)

1)由于以下行,您无法删除Code2行:

numberOfLines = len(fr.readlines())        #get the number of lines in the file

在该行中,您正在阅读文件的末尾。再次打开它会使您处于文件的开头...

2)与上面的答案类似,如果你调用readLines()读取所有行并将文件光标移动到文件的末尾...所以如果你再尝试读取文件的行,没有什么可读的,因此失败了。

答案 1 :(得分:1)

您在文件的末尾。因此,您第二次尝试读取文件内容会产生影响。你需要回到文件的开头。使用:

fr.seek(0)

而不是你的:

fr = open(filename)  # Code2!!!!!!!!!!!!!!!!!!!!!

答案 2 :(得分:0)

您只需要readlines一次。

def file2matrix(filename):
    fr = open(filename)
    lines = fr.readlines()    
    fr.close()    
    numberOfLines = len(lines)        #get the number of lines in the file
    returnMat = zeros((numberOfLines,3))       #prepare matrix to return   
    classLabelVector = []                      #prepare labels return   
    index = 0
    for line in lines:
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        # careful here, returnMat is initialed as floats
        # listFromLine is list of strings
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

我可以建议一些其他的改变:

def file2matrix(filename):
    with open(filename) as f:
        lines = f.readlines()
    returnList = []
    classLabelList = []
    for line in lines:
        listFromLine = line.strip().split('\t')
        returnList.append(listFromLine[0:3])
        classLabelList.append(int(listFromLine[-1]))
    returnMat = np.array(returnList, dtype=float)
    return returnMat, classLabelList

甚至

def file2matrix(filename):
    with open(filename) as f:
        lines = f.readlines()
    ll = [line.strip().split('\t')]
    returnMat = np.array([l[0:3] for l in ll], dtype=float)
    classLabelList = [int(l[-1]) for l in ll]
    # classLabelVec = np.array([l[-1] for l in ll], dtype=int)
    return returnMat, classLabelList