尝试将字符串数组转换为具有混合给定dtypes的数组时的ValueError

时间:2013-09-25 11:56:33

标签: python arrays numpy

我正在尝试将包含字符串的数组转换为具有浮点值和字符串的数组。我的代码目前看起来像这样:

datatype1=np.dtype([
('LOCATION_THETA',np.float64),
('LOCATION_PHI',np.float64),
('ETHETA_MAGN',np.float64),
('ETHETA_PHASE',np.float64),
('EPHI_MAGN',np.float64),
('EPHI_PHASE',np.float64),
('DIRECTIVITY_VERT',np.float64),
('DIRECTIVITY_HORIZ',np.float64),
('DIRECTIVITY_TOTAL',np.float64),
('POLARISATION_AXIALR',np.float64),
('POLARISATION_ANGLE',np.float64),
('POLARISATION_DIRECTION','|S5')])

table2=np.array(table,dtype=datatype1)

table(一个字符串数组)看起来像这样:

[['0.00' '0.00' '5.751E-01' '-2.08' '9.532E-05' '-86.19' '1.7442' '-73.8670' '1.7442' '0.0002' '0.00' 'RIGHT']
 ['2.00' '0.00' '5.747E-01' '-2.11' '1.291E-04' '-82.47' '1.7390' '-71.2312' '1.7390' '0.0002' '0.00' 'RIGHT']
 ['4.00' '0.00' '5.738E-01' '-2.21' '1.632E-04' '-80.31' '1.7243' '-69.1973' '1.7243' '0.0003' '0.00' 'RIGHT']
 ['6.00' '0.00' '5.722E-01' '-2.38' '1.973E-04' '-78.94' '1.7001' '-67.5479' '1.7001' '0.0003' '0.00' 'RIGHT']
 ['8.00' '0.00' '5.699E-01' '-2.61' '2.314E-04' '-78.02' '1.6663' '-66.1644' '1.6663' '0.0004' '0.01' 'RIGHT']
...

但是,当我执行脚本时,出现以下错误:

ValueError: could not convert string to float: RIGHT

它不应该这样做,因为我希望字符串为|S5,而不是float ......

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

这里发生的是当你这样做时:

ts = np.array(t, dtype=dt)

dtype正在应用于table中的每个元素。它适用于前11个元素,然后它到达'RIGHT',它不能成为整数。如果没有'RIGHT',这就是它的作用(这会很混乱!):

>>> t[:2,:-1]
array([['0.00', '0.00', '5.751E-01', '-2.08', '9.532E-05', '-86.19', '1.7442', '-73.8670', '1.7442', '0.0002', '0.00'],
       ['2.00', '0.00', '5.747E-01', '-2.11', '1.291E-04', '-82.47', '1.7390', '-71.2312', '1.7390', '0.0002', '0.00']], 
      dtype='|S9')

>>> np.array(t[:2,:-1], dt)
array([[(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '0.00'),
        (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '0.00'),
        (0.5751, 0.5751, 0.5751, 0.5751, 0.5751, 0.5751, 0.5751, 0.5751, 0.5751, 0.5751, 0.5751, '5.751'),
        (-2.08, -2.08, -2.08, -2.08, -2.08, -2.08, -2.08, -2.08, -2.08, -2.08, -2.08, '-2.08'),
        (9.532e-05, 9.532e-05, 9.532e-05, 9.532e-05, 9.532e-05, 9.532e-05, 9.532e-05, 9.532e-05, 9.532e-05, 9.532e-05, 9.532e-05, '9.532'),
        (-86.19, -86.19, -86.19, -86.19, -86.19, -86.19, -86.19, -86.19, -86.19, -86.19, -86.19, '-86.1'),
        (1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, '1.744'),
        (-73.867, -73.867, -73.867, -73.867, -73.867, -73.867, -73.867, -73.867, -73.867, -73.867, -73.867, '-73.8'),
        (1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, 1.7442, '1.744'),
        (0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, '0.000'),
        (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '0.00')],
       [(2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, '2.00'),
        (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '0.00'),
        (0.5747, 0.5747, 0.5747, 0.5747, 0.5747, 0.5747, 0.5747, 0.5747, 0.5747, 0.5747, 0.5747, '5.747'),
        (-2.11, -2.11, -2.11, -2.11, -2.11, -2.11, -2.11, -2.11, -2.11, -2.11, -2.11, '-2.11'),
        (0.0001291, 0.0001291, 0.0001291, 0.0001291, 0.0001291, 0.0001291, 0.0001291, 0.0001291, 0.0001291, 0.0001291, 0.0001291, '1.291'),
        (-82.47, -82.47, -82.47, -82.47, -82.47, -82.47, -82.47, -82.47, -82.47, -82.47, -82.47, '-82.4'),
        (1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, '1.739'),
        (-71.2312, -71.2312, -71.2312, -71.2312, -71.2312, -71.2312, -71.2312, -71.2312, -71.2312, -71.2312, -71.2312, '-71.2'),
        (1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, 1.739, '1.739'),
        (0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, 0.0002, '0.000'),
        (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '0.00')]], 
      dtype=[('LOCATION_THETA', '<f8'), ('LOCATION_PHI', '<f8'), ('ETHETA_MAGN', '<f8'), ('ETHETA_PHASE', '<f8'), ('EPHI_MAGN', '<f8'), ('EPHI_PHASE', '<f8'), ('DIRECTIVITY_VERT', '<f8'), ('DIRECTIVITY_HORIZ', '<f8'), ('DIRECTIVITY_TOTAL', '<f8'), ('POLARISATION_AXIALR', '<f8'), ('POLARISATION_ANGLE', '<f8'), ('POLARISATION_DIRECTION', 'S5')])

所以,你可以看到,对于每个元素,你会得到一个带有dtype datatype1的漂亮小元组('record')(它甚至会让最后一个元素成为你的字符串)。

有几种解决方法,最好的方法是从头开始使用正确的dtype创建/导入数组,这样就不必复制它。对于某些转换,可以将view设置为简单解释数据,就像它具有新的复杂dtype一样,但这不会将字符串转换为数字,因为比假装数据是一个数字更复杂。

在您的情况下,您应该使用比recarray稍微复杂的regular structured array,然后您可以使用fromarrays功能。它需要一个列列表,每个列都有统一的类型,而不是行,因此是转置:

>>> np.rec.fromarrays(t.T, dt)
rec.array([ (0.0, 0.0, 0.5751, -2.08, 9.532e-05, -86.19, 1.7442, -73.867, 1.7442, 0.0002, 0.0, 'RIGHT'),
       (2.0, 0.0, 0.5747, -2.11, 0.0001291, -82.47, 1.739, -71.2312, 1.739, 0.0002, 0.0, 'RIGHT'),
       (4.0, 0.0, 0.5738, -2.21, 0.0001632, -80.31, 1.7243, -69.1973, 1.7243, 0.0003, 0.0, 'RIGHT'),
       (6.0, 0.0, 0.5722, -2.38, 0.0001973, -78.94, 1.7001, -67.5479, 1.7001, 0.0003, 0.0, 'RIGHT'),
       (8.0, 0.0, 0.5699, -2.61, 0.0002314, -78.02, 1.6663, -66.1644, 1.6663, 0.0004, 0.01, 'RIGHT')], 
      dtype=[('LOCATION_THETA', '<f8'), ('LOCATION_PHI', '<f8'), ('ETHETA_MAGN', '<f8'), ('ETHETA_PHASE', '<f8'), ('EPHI_MAGN', '<f8'), ('EPHI_PHASE', '<f8'), ('DIRECTIVITY_VERT', '<f8'), ('DIRECTIVITY_HORIZ', '<f8'), ('DIRECTIVITY_TOTAL', '<f8'), ('POLARISATION_AXIALR', '<f8'), ('POLARISATION_ANGLE', '<f8'), ('POLARISATION_DIRECTION', 'S5')])

可爱!但是等等,现在是这个rec.array...如果你想保持这种状态,那很好。如果您希望它是regular structured array,请执行:

>>> np.asarray(np.rec.fromarrays(t.T, dt))