我有一个非常简单的文本文件,我想使用numpy进行读取。我需要读取多于2列的行中的数字,其中行不是以“#”开头。>
12
C 0.000000 0.000000 0.000000
C 0.000000 0.000000 1.400000
C 1.212436 0.000000 2.100000
C 2.424871 0.000000 1.400000
C 2.424871 0.000000 0.000000
C 1.212436 0.000000 -0.700000
H -0.943102 0.000000 1.944500
H 1.212436 0.000000 3.189000
H 3.367973 0.000000 1.944500
H 3.367973 0.000000 -0.544500
H 1.212436 0.000000 -1.789000
H -0.943102 0.000000 -0.544500
我尝试了以下代码:
import numpy as np
class mol:
import numpy as np
class mol:
def __init__(self):
self.masses = {'H': 1, 'D': 2, 'C': 12, 'O': 16}
def read_xyz(self, filename):
self.filename = filename
with open(self.filename) as f:
for line in f:
if not line.startswith("#") and len(line.split())>3:
print np.loadtxt(line)
if __name__ == "__main__":
test = mol()
test.read_xyz('benz.xyz')
但是我的代码崩溃了,如果我打印该行,我会不知道为什么每一行之间都有一个空行。 任何帮助都会很棒!
答案 0 :(得分:0)
我建议您改用正则表达式,例如:
import numpy as np
class mol:
def __init__(self):
self.masses = {'H': 1, 'D': 2, 'C': 12, 'O': 16}
def read_xyz(self, filename):
self.filename = filename
regexp = r'\s+\w+' + r'\s+([-.0-9]+)' * 3 + r'\s*\n'
data = np.fromregex(self.filename, regexp, dtype='f')
print(data)
if __name__ == "__main__":
test = mol()
test.read_xyz('benz.xyz')
在这种情况下,我获得了:
[[ 0. 0. 0. ]
[ 0. 0. 1.4 ]
[ 1.212436 0. 2.1 ]
[ 2.424871 0. 1.4 ]
[ 2.424871 0. 0. ]
[ 1.212436 0. -0.7 ]
[-0.943102 0. 1.9445 ]
[ 1.212436 0. 3.189 ]
[ 3.367973 0. 1.9445 ]
[ 3.367973 0. -0.5445 ]
[ 1.212436 0. -1.789 ]
[-0.943102 0. -0.5445 ]]
如果要保留第一列字符,则需要修改正则表达式。