Python - 解析列和行

时间:2011-02-18 16:24:01

标签: python parsing

我在将文本文件的内容解析为2D数组/列表时遇到了麻烦。我不能使用内置库,所以采取了不同的方法。这是我的文本文件,然后是我的代码

1,0,4,3,6,7,4,8,3,2,1,0
2,3,6,3,2,1,7,4,3,1,1,0
5,2,1,3,4,6,4,8,9,5,2,1
def twoDArray():
    network = [[]]       

    filename = open('twoDArray.txt', 'r')
    for line in filename.readlines():
        col = line.split(line, ',')
        row = line.split(',')

    network.append(col,row)        

    print "Network = "
    print network        

if __name__ == "__main__":
    twoDArray()

我运行此代码但出现此错误:

Traceback (most recent call last):
  File "2dArray.py", line 22, in <module>
    twoDArray()
  File "2dArray.py", line 8, in twoDArray
    col = line.split(line, ',')
TypeError: an integer is required

我使用逗号分隔行和列,因为我不确定如何区分这两者 - 我很困惑为什么它告诉我当文件由整数组成时需要一个整数< / p>

7 个答案:

答案 0 :(得分:4)

好吧,我可以解释一下这个错误。你正在使用str.split(),其使用模式是:

str.split(separator,maxsplit)

您正在使用str.split(字符串,分隔符),这不是对split的有效调用。以下是Python文档的直接链接:

http://docs.python.org/library/stdtypes.html#str.split

答案 1 :(得分:2)

要直接回答您的问题,以下行存在问题:

col = line.split(line, ',')

如果您查看documentation for str.split,您会发现说明如下:

str.split([sep[, maxsplit]])
     

使用sep作为分隔符字符串,返回字符串中的单词列表。如果给出 maxsplit ,则最多   完成 maxsplit 拆分(因此,列表最多只有maxsplit+1个元素)。如果未指定 maxsplit ,则对分割数量没有限制(所有可能的分割都会生成)。

这不是你想要的。您不是要指定要进行的拆分数。


请考虑使用以下内容替换for循环和network.append

for line in filename.readlines():
    # line is a string representing the values for this row
    row = line.split(',')
    # row is the list of numbers strings for this row, such as ['1', '0', '4', ...]
    cols = [int(x) for x in row]
    # cols is the list of numbers for this row, such as [1, 0, 4, ...]
    network.append(row)
    # Put this row into network, such that network is [[1, 0, 4, ...], [...], ...]

答案 2 :(得分:2)

“”“我不能使用内置库”“” - 您是否真的意味着“不能”,因为您尝试使用csv模块并失败了?如果是这样,请说出来。你的意思是“你可能不会”因为你的家庭作业而被禁止使用内置模块吗?如果是这样,请说出来。

这是一个有效的答案。它不会在每行的最后一项末尾附加换行符。它将数字转换为int,以便您可以将它们用于任何目的。它修复了其他人没有提到的其他错误。

def twoDArray():
    network = []       
    # filename = open('twoDArray.txt', 'r')
    # "filename" is a very weird name for a file HANDLE
    f = open('twoDArray.txt', 'r')
    # for line in filename.readlines():
    # readlines reads the whole file into memory at once.
    # That is quite unnecessary.
    for line in f: # just iterate over the file handle
        line = line.rstrip('\n') # remove the newline, if any
        # col = line.split(line, ',')
        # wrong args, as others have said.
        # In any case, only 1 split call is necessary 
        row = line.split(',')
        # now convert string to integer
        irow = [int(item) for item in row]
        # network.append(col,row)        
        # list.append expects only ONE arg
        # indentation was wrong; you need to do this once per line
        network.append(irow)

    print "Network = "
    print network        

if __name__ == "__main__":
    twoDArray()

答案 3 :(得分:0)

... OMG

network = []
filename = open('twoDArray.txt', 'r')
for line in filename.readlines():
    network.append(line.split(','))

你拿

[
[1,0,4,3,6,7,4,8,3,2,1,0],
[2,3,6,3,2,1,7,4,3,1,1,0],
[5,2,1,3,4,6,4,8,9,5,2,1]
]

或者您需要一些其他结构作为输出?请添加您需要的输出?

答案 4 :(得分:0)

class TwoDArray(object):
    @classmethod
    def fromFile(cls, fname, *args, **kwargs):
        splitOn = kwargs.pop('splitOn', None)
        mode    = kwargs.pop('mode',    'r')
        with open(fname, mode) as inf:
            return cls([line.strip('\r\n').split(splitOn) for line in inf], *args, **kwargs)

    def __init__(self, data=[[]], *args, **kwargs):
        dataType = kwargs.pop('dataType', lambda x:x)
        super(TwoDArray,self).__init__()
        self.data = [[dataType(i) for i in line] for line in data]

    def __str__(self, fmt=str, endrow='\n', endcol='\t'):
        return endrow.join(
            endcol.join(fmt(i) for i in row) for row in self.data
        )

def main():
    network = TwoDArray.fromFile('twodarray.txt', splitOn=',', dataType=int)

    print("Network =")
    print(network)

if __name__ == "__main__":
    main()

答案 5 :(得分:0)

输入格式很简单,所以解决方案也应该很简单:

network = [map(int, line.split(',')) for line in open(filename)]
print network
在这种情况下,

csv module没有提供优势:

import csv
print [map(int, row) for row in csv.reader(open(filename, 'rb'))]

如果您需要float而不是int

print list(csv.reader(open(filename, 'rb'), quoting=csv.QUOTE_NONNUMERIC))

如果您正在使用numpy数组:

import numpy
print numpy.loadtxt(filename, dtype='i', delimiter=',')

请参阅Why NumPy instead of Python lists?

所有示例都生成等于:

的数组
[[1 0 4 3 6 7 4 8 3 2 1 0]
 [2 3 6 3 2 1 7 4 3 1 1 0]
 [5 2 1 3 4 6 4 8 9 5 2 1]]

答案 6 :(得分:0)

从文件中读取数据。这是一种方式:

f = open('twoDArray.txt', 'r')
buffer = f.read()
f.close()

将数据解析为表

table = [map(int, row.split(',')) for row in buffer.strip().split("\n")]
>>> print table
[[1, 0, 4, 3, 6, 7, 4, 8, 3, 2, 1, 0], [2, 3, 6, 3, 2, 1, 7, 4, 3, 1, 1, 0], [5, 2, 1, 3, 4, 6, 4, 8, 9, 5, 2, 1]]

也许你想要转置:

transpose = zip(*table)
>>> print transpose
[(1, 2, 5), (0, 3, 2), (4, 6, 1), (3, 3, 3), (6, 2, 4), (7, 1, 6), (4, 7, 4), (8, 4, 8), (3, 3, 9), (2, 1, 5), (1, 1, 2), (0, 0, 1)]