Question

我有一个日期格式为＆＃34; 1：*的文件？ year mo da ho mi se.condsdec＆＃34;，（＆＃34;？＆＃34;是1个字符的通配符）ie：

*A 2014 12 31 23 59 59.123456

我想将其提取为字符串（最终转换为日期时间字符串）。

我可以使用正则表达式模式将日期提取为一组int / floats：

time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'

但不是字符串。如何使用字符串使其工作？

我正在使用numpy 1.9.3的python 3.4.3。

import numpy as np
time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
t_dtype=[('year',np.int16),('month',np.int8),('day',np.int8),\
('hour',np.int8),('min',np.int8),('sec',np.float64)]
out=np.fromregex('filename',time_pattern,t_dtype)
print(out)
#returns [(2013, 11, 26, 0, 0, 10.0) (2013, 11, 26, 0, 0, 20.0)
# (2013, 11, 26, 0, 0, 30.0)]


basic_t=r'$\*.{2}(.{28})'
t_dtype=[('date',str)]
out=np.fromregex('filename',basic_t,t_dtype)
#causes TypeError: 
#TypeError: Empty data-type

使用文件filename：

*  2003 11 26 00 00 10.00000000  
some text or interesting data                      
*  2003 11 26 00 00 20.00000000
more text
even more text                         
*  2003 11 26 00 00 30.00000000    
etc.

编辑：注意简单的模式

with open(file) as f: 
   for line in f: 
      m=re.search(basic_t,line)

但我希望将输出作为一个numpy数组，并希望将运行时间保持在最低限度。

修改编辑：将dtype更改为'S'或np.str会删除错误，但我仍然会将空列表作为输出

Answer 1

您的问题是，当您将dtype指定为np.str_时，您将dtype设置为int或float。您还需要指定字符串的长度，以便

import numpy as np

time_pattern=r'\*.{2}(\d{4}) (\d{2}) (\d{2}) (\d{2}) (\d{2}) (\d{2}\.\d{8})'
t_dtype=[('year',np.str_,4),('month',np.str_,2),('day',np.str_,2),\
('hour',np.str_,2),('min',np.str_,2),('sec',np.str_,3)]

out=np.fromregex('filename',time_pattern,t_dtype)
print(out)

如果你看一下this的第二个例子，它会显示如何处理字符串

np.fromregex，字符串为dtype

1 个答案: