我需要能够读取文件并将其导入Python。导致问题的原因是文件不一致。这就是文件中的内容:
-0.2066687680781E-01 0.4329528510571E+00-0.9011796712875E+00
-0.4119676724076E-01 0.4006276726723E+00-0.9153143167496E+00
0.1022378727794E+00 0.2991854846478E+00-0.9487020373344E+00
0.2066854201257E-01 0.3005275726318E+00-0.9535492062569E+00
0.4130198806524E-01 0.3341401219368E+00-0.9416180849075E+00
0.6145291402936E-01 0.3000802397728E+00-0.9519324898720E+00
0.8211978524923E-01 0.3335199654102E+00-0.9391596317291E+00
0.6186530366540E-01 0.3671853244305E+00-0.9280881881714E+00
-0.2066862955689E-01 0.3678680062294E+00-0.9296482801437E+00
0.2066862955689E-01 0.3678680062294E+00-0.9296482801437E+00
0.0000000000000E+00 0.3344254791737E+00-0.9424222111702E+00
0.5163235664368E+00-0.3289847448468E-01-0.8557614684105E+00
0.5062980055809E+00-0.6575757265091E-01-0.8598478436470E+00
0.4863796830177E+00-0.3290597721934E-01-0.8731277585030E+00
0.4844416379929E+00-0.1312004029751E+00-0.8649293184280E+00
0.4652865529060E+00-0.9858986735344E-01-0.8796525001526E+00
0.4453650414944E+00-0.6581693142653E-01-0.8929267525673E+00
0.4761176705360E+00-0.6582681834698E-01-0.8769143819809E+00
大多数情况下,数字被分为三列,但如果它是负数,则没有空格,并且在将其加载到Python时会导致错误。这是我用来加载文件的内容:
from numpy import *
import numpy as np
sphere = np.loadtxt("sphererad1.out")
这是我得到的错误:
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 827, in loadtxt
items = [conv(val) for (conv, val) in zip(converters, vals)]
ValueError: invalid literal for float(): 0.2899294197559E+00-0.1325698643923E+00
我无法重新生成数据,因此我必须弄清楚如何将其导入Python。 我尝试使用以下方法导入Python:
opn = open("sphererad1.out")
sphere = opn.readlines()
opn.close()
为了测试将其分解为每个数字,我尝试了这个:
l = sphere[2000]
n = 18
[l[i:i+n] for i in range(0, len(l), n)]
我得到了
[' -0.24', '73256886005E+00-0.', '6656686961651E-01-', '0.9666430950165E+0', '0\n']
如果第一个数字为负数,则左侧有13个空格,如果第一个数字为正数,则左侧有14个空格。
n = 1
[l[i:i+n] for i in range(0, len(l), n)]
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', '-', '0', '.', '2', '4', '7', '3', '2', '5', '6', '8', '8', '6', '0', '0', '5', 'E', '+', '0', '0', '-', '0', '.', '6', '6', '5', '6', '6', '8', '6', '9', '6', '1', '6', '5', '1', 'E', '-', '0', '1', '-', '0', '.', '9', '6', '6', '6', '4', '3', '0', '9', '5', '0', '1', '6', '5', 'E', '+', '0', '0', '\n']
如何让它忽略第一块空格,然后将其分成三列数字并制作一个数组?
答案 0 :(得分:1)
使用正则表达式:
import re
for line in open("sphererad1.out"):
print(list(map(float, re.findall(' *(-?\\d+\\.\\d*[eE][+-]\\d+)', line))))
[-0.02066687680781, 0.4329528510571, -0.9011796712875]
[-0.04119676724076, 0.4006276726723, -0.9153143167496]
[0.1022378727794, 0.2991854846478, -0.9487020373344]
[0.02066854201257, 0.3005275726318, -0.9535492062569]
[0.04130198806524, 0.3341401219368, -0.9416180849075]
[0.06145291402936, 0.3000802397728, -0.951932489872]
[0.08211978524923, 0.3335199654102, -0.9391596317291]
[0.0618653036654, 0.3671853244305, -0.9280881881714]
[-0.02066862955689, 0.3678680062294, -0.9296482801437]
[0.02066862955689, 0.3678680062294, -0.9296482801437]
[0.0, 0.3344254791737, -0.9424222111702]
[0.5163235664368, -0.03289847448468, -0.8557614684105]
[0.5062980055809, -0.06575757265091, -0.859847843647]
[0.4863796830177, -0.03290597721934, -0.873127758503]
[0.4844416379929, -0.1312004029751, -0.864929318428]
[0.465286552906, -0.09858986735344, -0.8796525001526]
[0.4453650414944, -0.06581693142653, -0.8929267525673]
[0.476117670536, -0.06582681834698, -0.8769143819809]
答案 1 :(得分:1)
我首先使用string.strip()
删除每行开头(和结尾)的空格,然后尝试使用您在上面的问题中已经概述的方法每18个字符拆分它。:
def parse_line(line):
return [line[i:i+n].strip() for i in range(0, len(l), n)]
def get_matrix(filename):
with open(filename) as f:
return [parse_line(line.strip()) for line in f.readlines()]
或者,您可以调整行解析代码,以便从第0个索引开始,而不是从0索引开始。但是,这是一个不太强大的解决方案,所以我仍然会选择第一个。
def parse_line(line):
return [line[i:i+n].strip() for i in range(13, len(l), n)]
def get_matrix(filename):
with open(filename) as f:
return [parse_line(line) for line in f.readlines()]
答案 2 :(得分:1)
使用numpy.genfromtxt
解析固定宽度的文件。 delimiter
参数可以设置为字段宽度序列。 autostrip
从数据中删除空格。
numpy.genfromtxt(fname, delimiter=(33, 20, 20), autostrip=True)
答案 3 :(得分:0)
如果你只是遇到负数的问题,你可以在每个非指数负数之前在文件的每一行注入一个空格:
import numpy as np
import re
values = []
with open(input) as handle:
for line in handle:
values.append(map(float, re.sub(r'(?<![eE])[-]', ' -', line).split()))
values = np.asarray(values)
这里我使用负面的lookbehind断言来阻止匹配E-
。
答案 4 :(得分:0)
你不能轻易切片吗?
for line in bad_file:
print float(line[13:33]), float(line[33:53]), float(line[53:73])
或者一次性获取所有数据:
new_data = [
[float(line[13:33]), float(line[33:53]), float(line[53:73])]
for line in bad_file
]