Question

我有一个非常简单的文本文件，我想使用numpy进行读取。我需要读取多于2列的行中的数字，其中行不是以“＃”开头。

   12

 C     0.000000     0.000000     0.000000
 C     0.000000     0.000000     1.400000
 C     1.212436     0.000000     2.100000
 C     2.424871     0.000000     1.400000
 C     2.424871     0.000000     0.000000
 C     1.212436     0.000000    -0.700000
 H    -0.943102     0.000000     1.944500
 H     1.212436     0.000000     3.189000
 H     3.367973     0.000000     1.944500
 H     3.367973     0.000000    -0.544500
 H     1.212436     0.000000    -1.789000
 H    -0.943102     0.000000    -0.544500

我尝试了以下代码：

import numpy as np
class mol:



import numpy as np
class mol:

    def __init__(self):
        self.masses = {'H': 1, 'D': 2, 'C': 12, 'O': 16}

    def read_xyz(self, filename):
        self.filename = filename
        with open(self.filename) as f:
            for line in f:
                if not line.startswith("#") and len(line.split())>3:
                    print np.loadtxt(line)

if __name__ == "__main__":
    test = mol()
    test.read_xyz('benz.xyz')

但是我的代码崩溃了，如果我打印该行，我会不知道为什么每一行之间都有一个空行。任何帮助都会很棒！

Answer 1

我建议您改用正则表达式，例如：

import numpy as np
class mol:

    def __init__(self):
        self.masses = {'H': 1, 'D': 2, 'C': 12, 'O': 16}

    def read_xyz(self, filename):
        self.filename = filename
        regexp = r'\s+\w+' + r'\s+([-.0-9]+)' * 3 + r'\s*\n'
        data = np.fromregex(self.filename, regexp, dtype='f')
        print(data)

if __name__ == "__main__":
    test = mol()
    test.read_xyz('benz.xyz')

在这种情况下，我获得了：

[[ 0.        0.        0.      ]
 [ 0.        0.        1.4     ]
 [ 1.212436  0.        2.1     ]
 [ 2.424871  0.        1.4     ]
 [ 2.424871  0.        0.      ]
 [ 1.212436  0.       -0.7     ]
 [-0.943102  0.        1.9445  ]
 [ 1.212436  0.        3.189   ]
 [ 3.367973  0.        1.9445  ]
 [ 3.367973  0.       -0.5445  ]
 [ 1.212436  0.       -1.789   ]
 [-0.943102  0.       -0.5445  ]]

如果要保留第一列字符，则需要修改正则表达式。

使用numpy从文本文件读取文件

1 个答案: