Question

我有一个看起来像这样的文件：

some text
the grids are 
       3 x 3

more text

matrix marker 1 1
3 2 4
7 4 2
9 1 1

new matrix  2 4
9 4 1
1 3 4
4 3 1

new matrix  3 3
7 2 1
1 3 4
2 3 2

..文件继续，几个3x3矩阵以相同的方式出现。每个矩阵都以带有唯一ID的文本开头，但ID对我来说并不是特别重要。我想创建这些矩阵的矩阵。我可以使用loadtxt来做到这一点吗？

这是我最好的尝试。此代码中的6可以替换为从6开始的迭代变量，并按矩阵中的行数递增。我认为skiprows会接受一个列表，但显然它只接受整数。

np.loadtxt(fl, skiprows = [x for x in range(nlines) if x not in (np.array([1,2,3])+ 6)])

TypeError                                 Traceback (most recent call last)
<ipython-input-23-7d82fb7ef14a> in <module>()
----> 1 np.loadtxt(fl, skiprows = [x for x in range(nlines) if x not in (np.array([1,2,3])+ 6)])

/usr/local/lib/python2.7/site-packages/numpy/lib/npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
    932 
    933         # Skip the first `skiprows` lines
--> 934         for i in range(skiprows):
    935             next(fh)
    936

Answer 1

也许我误解了，但是如果你可以匹配3x3矩阵之前的行，那么你可以创建一个生成器来提供给loadtxt：

import numpy as np

def get_matrices(fs):
    while True:
        line = next(fs)
        if not line:
            break
        if 'matrix' in line: # or whatever matches the line before a matrix
            yield next(fs)
            yield next(fs)
            yield next(fs)


with open('matrices.dat') as fs:
    g = get_matrices(fs)
    M = np.loadtxt(g)

M = M.reshape((M.size//9, 3, 3))
print(M)

如果你喂它：

some text
the grids are 
       3 x 3

more text

matrix marker 1 1
3 2 4
7 4 2
9 1 1

new matrix  2 4
9 4 1
1 3 4
4 3 1

new matrix  3 3
7 2 1
1 3 4
2 3 2

new matrix  7 6
1 0 1
2 0 3
0 1 2

你得到一个矩阵数组：

[[[ 3.  2.  4.]
  [ 7.  4.  2.]
  [ 9.  1.  1.]]

 [[ 9.  4.  1.]
  [ 1.  3.  4.]
  [ 4.  3.  1.]]

 [[ 7.  2.  1.]
  [ 1.  3.  4.]
  [ 2.  3.  2.]]

 [[ 1.  0.  1.]
  [ 2.  0.  3.]
  [ 0.  1.  2.]]]

或者，如果您只想yield所有看起来像是3x3整数矩阵的行，请匹配正则表达式：

import re

def get_matrices(fs):
    while True:
        line = next(fs)
        if not line:
            break
        if re.match('\d+\s+\d+\s+\d+', line):
            yield line

Answer 2

您需要更改处理工作流程以使用步骤：首先，提取与所需矩阵对应的子字符串，然后调用numpy.loadtxt。要做到这一点，一个很好的方法是：

使用re查找矩阵的开头和结尾。
在该范围内加载矩阵
重置您的范围并继续。

您的矩阵标记看起来多种多样，因此您可以使用这样的正则表达式：

start = re.compile("\w+\s+matrix\s+(\d+)\s+(\d+)\n")
end = re.compile("\n\n")

然后，您可以找到开始/结束对，然后加载每个矩阵的文本：

import io
import numpy as np

# read our data
data = open("/path/to/file.txt").read()

def load_matrix(data, *args):
    # find start and end bounds
    s = start.search(data)
    if not s:
        # no matrix leftover, return None
        return None
    e = end.search(data, s.end())
    e_index = e.end() if e else len(data)

    # load text
    buf = io.StringIO(data[s.end(): e_index])
    matrix = np.loadtxt(buf, *args)    # add other args here

    # reset our buffer
    data = data[e_index:]

    return matrix

<强>观

在这种情况下，矩阵起点的正则表达式标记具有矩阵维度的捕获组(\d+)，因此您可以根据需要获得矩阵的MxN表示。然后列出项目I，然后搜索单词＆＃34;矩阵＆＃34;在该行上，任意前导文本和两个数字在末尾用空格分隔。

我的结局是两个＆＃34; \ n \ n＆＃34;组或两个换行符（如果您有Windows行结尾，您可能需要考虑＆＃34; \ r＆＃34;）。

自动化

既然我们有办法找到一个案例，那么您需要做的就是迭代这个并填充矩阵列表，同时仍然可以获得匹配。

matrices = []

# read our data
data = open("/path/to/file.txt").read()

while True:
    result = load_matrix(data, ...)     # pass other arguments to loadtxt
    if not result:
        break
    matrices.append(result)

np.loadtxt用于包含许多矩阵的文件

2 个答案: