Question

我对Python有点熟悉。我有一个文件，其中包含我需要以非常具体的方式阅读的信息。以下是一个例子......

1
6
0.714285714286
0    0    1.00000000000
0    1    0.61356352337
...
-1  -1    0.00000000000
0    0    5.13787636499
0    1    0.97147643932
...
-1  -1    0.00000000000
0    0    5.13787636499
0    1    0.97147643932
...
-1  -1    0.00000000000
0 0 0 0   5.13787636499
0 0 0 1   0.97147643932
....

所以每个文件都有这个结构（制表符分隔）。

第一行必须作为变量以及第二行和第三行读入。
接下来，我们有四个代码块，由-1 -1 0.0000000000分隔。每个代码块都是'n'行长。前两个数字表示行中第3个数字要插入数组的位置/位置。仅列出唯一位置（因此，位置0 1将与1 0相同，但不会显示该信息）。
注意：第4个代码块有一个4索引号。

我需要什么

前三行作为唯一变量读入
每个数据块使用前2（或4）列数字作为数组索引读入数组，第3列作为插入数组的值。
仅显示唯一的数组元素。我还需要使用正确的值填充镜像位置（0 1中也应显示1 0值。
最后一个块需要插入到一个4维数组中。

Answer 1

我重写了代码。现在几乎就是你所需要的。你只需要微调。

我决定留下旧的答案 - 也许它也会有所帮助。因为新功能足够丰富，有时可能不太清楚。

def the_function(filename):
    """
    returns tuple of list of independent values and list of sparsed arrays as dicts
    e.g. ( [1,2,0.5], [{(0.0):1,(0,1):2},...] )
    on fail prints the reason and returns None:
    e.g. 'failed on text.txt: invalid literal for int() with base 10: '0.0', line: 5'
    """

    # open file and read content
    try:
        with open(filename, "r") as f:
            data_txt = [line.split() for line in f]
    # no such file
    except IOError, e:
        print 'fail on open ' + str(e)

    # try to get the first 3 variables
    try:
        vars =[int(data_txt[0][0]), int(data_txt[1][0]), float(data_txt[2][0])]
    except ValueError,e:
        print 'failed on '+filename+': '+str(e)+', somewhere on lines 1-3'
        return

    # now get arrays
    arrays =[dict()]
    for lineidx, item in enumerate(data_txt[3:]):
        try:
            # for 2d array data
            if len(item) == 3:
                i, j = map(int, item[:2])
                val = float(item[-1])
                # check for 'block separator'
                if (i,j,val) == (-1,-1,0.0):
                    # make new array
                    arrays.append(dict())
                else:
                    # update last, existing
                    arrays[-1][(i,j)] = val
            # almost the same for 4d array data
            if len(item) == 5:
                i, j, k, m = map(int, item[:4])
                val = float(item[-1])
                arrays[-1][(i,j,k,m)] = val
        # if value is unparsable like '0.00' for int or 'text'
        except ValueError,e:
            print 'failed on '+filename+': '+str(e)+', line: '+str(lineidx+3)
            return
    return vars, arrays

Answer 2

据我了解你要求的是什么..

# read data from file into list
parsed=[]
with open(filename, "r") as f:
    for line in f:
        # # you can exclude separator here with such code (uncomment) (1)
        # # be careful one zero more, one zero less and it wouldn work
        # if line == '-1  -1    0.00000000000':
        #     continue
        parsed.append(line.split())

# a simpler version
with open(filename, "r") as f:
    # # you can exclude separator here with such code (uncomment, replace) (2)
    # parsed = [line.split() for line in f if line != '-1  -1    0.00000000000']
    parsed = [line.split() for line in f]

# at this point 'parsed' is a list of lists of strings.
# [['1'],['6'],['0.714285714286'],['0', '0', '1.00000000000'],['0', '1', '0.61356352337'] .. ]

# ALT 1 -------------------------------
# we do know the len of each data block 

# get the first 3 lines:
head = parsed[:3]

# get the body:
body = parsed[3:-2]

# get the last 2 lines:
tail = parsed[-2:]

# now you can do anything you want with your data
# but remember to convert str to int or float

# first3 as unique:
unique0 = int(head[0][0])
unique1 = int(head[1][0])
unique2 = float(head[2][0])

# cast body:
# check each item of body has 3 inner items
is_correct = all(map(lambda item: len(item)==3, body))
# parse str and cast
if is_correct:
    for i, j, v in body:
        # # you can exclude separator here (uncomment) (3)
        # # * 1. is the same as float(1)
        # if (i,j,v) == (0,0,1.):
        #     # here we skip iteration for line w/ '-1  -1    0.0...'
        #     # but you can place another code that will be executed 
        #     # at the point where block-termination lines appear
        #     continue 

        some_body_cast_function(int(i), int(j), float(v))
else:
    raise Exception('incorrect body')


# cast tail
# check each item of body has 5 inner items
is_correct = all(map(lambda item: len(item)==5, tail))
# parse str and cast
if is_correct:
    for i, j, k, m, v in body: # 'l' is bad index, because similar to 1.
        some_tail_cast_function(int(i), int(j), int(k), int(m), float(v))
else:
    raise Exception('incorrect tail')

# ALT 2 -----------------------------------
# we do NOT know the len of each data block 

# maybe we have some array?
array = dict() # your array may be other type

v1,v2,v2 = parsed[:3]
unique0 = int(v1[0])
unique1 = int(v2[0])
unique2 = float(v3[0])

for item in parsed[3:]:
    if len(item) == 3:
        i,j,v = item
        i = int(i)
        j = int(j)
        v = float(v)

        # # yo can exclude separator here (uncomment) (4)
        # # * 1. is the same as float(1)
        # # logic is the same as in 3rd variant
        # if (i,j,v) == (0,0,1.):
        #     continue

        # do your stuff
        # for example,
        array[(i,j)]=v
        array[(j,i)]=v

    elif len(item) ==5:
        i, j, k, m, v = item
        i = int(i)
        j = int(j)
        k = int(k)
        m = int(m)
        v = float(v)

        # do your stuff

    else:
        raise Exception('unsupported') # or, maybe just 'pass'

Answer 3

要迭代地从文件中读取行，您可以使用以下内容：

with open(filename, "r") as f:
  var1 = int(f.next())
  var2 = int(f.next())
  var3 = float(f.next())
  for line in f:
    do some stuff particular to the line we are on...

只需在循环外创建一些数据结构，并在上面的循环中填充它们。要将字符串拆分为元素，可以使用：

>>> "spam   ham".split()
['spam', 'ham']

我还认为您想查看numpy库中的数组数据结构，以及可能的SciPy库进行分析。

将文件内容读入数组

3 个答案: