我对np.fromregex有点熟悉。我阅读了这些教程,并尝试将其实现为读取数据文件。
当使用简单的python list comprehension读取文件时,它会给出所需的结果:
[400, 401, 405, 408, 412, 414, 420, 423, 433]
。
但是,当np.fromregex
为时,会给出另一种格式答案:
[(400,) (401,) (405,) (408,) (412,) (414,) (420,) (423,) (433,)]
。
如何更改代码,以便正则表达式的答案与简单的for循环python相同。
感谢。
P.S。我知道这是一个简单的问题,但我花了很多时间来寻找 解决方案,也可能对其他人有益,节省一些时间。
相关链接:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromregex.html
np.fromregex with string as dtype
from __future__ import print_function, division, with_statement, unicode_literals
import numpy as np
import re
data = """
DMStack failed for: lsst_z1.0_400.fits
DMStack failed for: lsst_z1.0_401.fits
DMStack failed for: lsst_z1.0_405.fits
DMStack failed for: lsst_z1.0_408.fits
DMStack failed for: lsst_z1.0_412.fits
DMStack failed for: lsst_z1.0_414.fits
DMStack failed for: lsst_z1.0_420.fits
DMStack failed for: lsst_z1.0_423.fits
DMStack failed for: lsst_z1.0_433.fits
"""
ifile = 'a.txt'
with open(ifile, 'w') as fo:
fo.write(data.lstrip())
# regex
regexp = r".*_(\d+?).fits"
# This works fine
ans = [int(re.findall(regexp, line)[0]) for line in open(ifile)]
print(ans)
# using fromregex
dt = [('num', np.int32)]
x = np.fromregex(ifile, regexp, dt)
print(x)
更新
当我使用未来的导入时,上面的代码失败了。错误日志如下:
Traceback (most recent call last):
File "a.py", line 31, in <module>
x = np.fromregex(ifile, regexp, dt)
File "/Users/poudel/miniconda2/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1452, in fromregex
dtype = np.dtype(dtype)
TypeError: data type not understood
$ which python
python is /Users/poudel/miniconda2/bin/python
$ python -c "import numpy; print(numpy.__version__)"
1.14.0
答案 0 :(得分:4)
只需选择群组,即可获得所需内容:
dt = [('num', np.int32)]
x = np.fromregex(ifile, regexp, dt)
print(x['num'])
#[400 401 405 408 412 414 420 423 433]
答案 1 :(得分:0)
import numpy as np
import cStringIO
import re
data = """
DMStack failed for: lsst_z1.0_400.fits
DMStack failed for: lsst_z1.0_401.fits
DMStack failed for: lsst_z1.0_405.fits
DMStack failed for: lsst_z1.0_408.fits
DMStack failed for: lsst_z1.0_412.fits
DMStack failed for: lsst_z1.0_414.fits
DMStack failed for: lsst_z1.0_420.fits
DMStack failed for: lsst_z1.0_423.fits
DMStack failed for: lsst_z1.0_433.fits
"""
# ifile = cStringIO.StringIO()
# ifile.write(data)
ifile = 'a.txt'
with open(ifile, 'w') as fo:
fo.write(data.lstrip())
# regex
regexp = r".*_(\d+?).fits"
# This works fine
ans = [int(re.findall(regexp, line)[0]) for line in open(ifile)]
print(ans)
# using fromregex
dt = [('num', np.int32)]
x = np.fromregex(ifile, regexp, dt)
y=[]
for i in x:
y = y + [i[0]]
print y
"""
[400, 401, 405, 408, 412, 414, 420, 423, 433]
[400, 401, 405, 408, 412, 414, 420, 423, 433]
"""
如果没有循环,我不知道这样做。
答案 2 :(得分:0)
感谢@zipa和@hpaulj,最后这段代码适用于 python2与未来的陈述。它也适用于python3。
而不是dt = [('num', np.int32)]
我们需要使用dt = [(str('num'), np.int32)]
。
#!python
# -*- coding: utf-8 -*-#
#
# Imports
from __future__ import print_function, division, with_statement, unicode_literals
import numpy as np
import re
data = """
DMStack failed for: lsst_z1.0_400.fits
DMStack failed for: lsst_z1.0_401.fits
DMStack failed for: lsst_z1.0_405.fits
DMStack failed for: lsst_z1.0_408.fits
DMStack failed for: lsst_z1.0_412.fits
DMStack failed for: lsst_z1.0_414.fits
DMStack failed for: lsst_z1.0_420.fits
DMStack failed for: lsst_z1.0_423.fits
DMStack failed for: lsst_z1.0_433.fits
"""
ifile = 'a.txt'
with open(ifile, 'w') as fo:
fo.write(data.lstrip())
# regex
regexp = r".*_(\d+?).fits"
dt = [(str('num'), np.int32)]
x = np.fromregex(ifile, regexp, dt)
print(x['num'])