无法解压缩列表中的元素

时间:2013-08-28 21:29:50

标签: python numpy unpack

以下是我的代码MWE

import numpy as np

# Load data from file.
data = np.genfromtxt('data_input', dtype=None, unpack=True)

print data

以下是data_input文件的示例:

01_500_aa_1000    990.0    990.0   112.5      0.2       72  0  0  1  0  0  0  0  0  0   0   0   0   1
02_500_aa_0950    990.0    990.0   112.5      0.2       77  0  0  1  0  0  0  0  0  0   0   0   0   1
03_500_aa_0600    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1
04_500_aa_0700    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1

unpack参数似乎不起作用,因为它始终打印:

[ ('01_500_aa_1000', 990.0, 990.0, 112.5, 0.2, 72, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('02_500_aa_0950', 990.0, 990.0, 112.5, 0.2, 77, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('03_500_aa_0600', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('04_500_aa_0700', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)]

有人可以复制这个吗?我做错了什么?

4 个答案:

答案 0 :(得分:2)

您收到此消息是因为genfromtxt正在返回numpy record array,而不是list。只是当你print()到控制台时它看起来像list

from cStringIO import StringIO
raw = """01_500_aa_1000    990.0    990.0   112.5      0.2       72  0  0  1  0  0  0  0  0  0   0   0   0   1
02_500_aa_0950    990.0    990.0   112.5      0.2       77  0  0  1  0  0  0  0  0  0   0   0   0   1
03_500_aa_0600    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1
04_500_aa_0700    990.0    990.0   112.5     0.18       84  0  0  1  0  0  0  0  0  0   0   0   0   1"""
sio = StringIO(raw)
data = genfromtxt(sio, dtype=None, unpack=False)
print data
print
print data.dtype

给出:

[ ('01_500_aa_1000', 990.0, 990.0, 112.5, 0.2, 72, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('02_500_aa_0950', 990.0, 990.0, 112.5, 0.2, 77, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('03_500_aa_0600', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
 ('04_500_aa_0700', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)]

[('f0', 'S14'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<i8'), ('f6', '<i8'), ('f7', '<i8'), ('f8', '<i8'), ('f9', '<i8'), ('f10', '<i8'), ('f11', '<i8'), ('f12', '<i8'), ('f13', '<i8'), ('f14', '<i8'), ('f15', '<i8'), ('f16', '<i8'), ('f17', '<i8'), ('f18', '<i8')]

unpack=Trueunpack=False似乎返回相同的内容,因为您需要recarray。我建议您尝试pandas并完全忘记recarray。您可以将recarray传递给pandas.DataFrame,然后完成!例如,

df = DataFrame(data)
print df
print
print df.f0

的产率:

               f0         f1         f2         f3         f4  f5  f6  f7  f8  \
0  01_500_aa_1000     990.00     990.00     112.50       0.20  72   0   0   1   
1  02_500_aa_0950     990.00     990.00     112.50       0.20  77   0   0   1   
2  03_500_aa_0600     990.00     990.00     112.50       0.18  84   0   0   1   
3  04_500_aa_0700     990.00     990.00     112.50       0.18  84   0   0   1   

   f9  f10  f11  f12  f13  f14  f15  f16  f17  f18  
0   0    0    0    0    0    0    0    0    0    1  
1   0    0    0    0    0    0    0    0    0    1  
2   0    0    0    0    0    0    0    0    0    1  
3   0    0    0    0    0    0    0    0    0    1  

0    01_500_aa_1000
1    02_500_aa_0950
2    03_500_aa_0600
3    04_500_aa_0700
Name: f0, dtype: object

答案 1 :(得分:2)

正如@Phillip Cloud所提到的,当你混合使用数据类型(字符串和数字)时,你会得到一个重新组合 - 第0列中的字符串会导致这种情况。

你可以通过单独导入第0列来解决这个问题:

>>> np.genfromtxt('data_input', usecols=range(1,18))
array([[  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          2.00000000e-01,   7.20000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          2.00000000e-01,   7.70000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          1.80000000e-01,   8.40000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00],
       [  9.90000000e+02,   9.90000000e+02,   1.12500000e+02,
          1.80000000e-01,   8.40000000e+01,   0.00000000e+00,
          0.00000000e+00,   1.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
          0.00000000e+00,   0.00000000e+00]])
>>> np.genfromtxt('data_input', usecols=0,dtype=None)
array(['01_500_aa_1000', '02_500_aa_0950', '03_500_aa_0600',
   '04_500_aa_0700'], 
  dtype='|S14')

或者,您可以像这样引用重新排列中的列:

>>> data['f0']
array(['01_500_aa_1000', '02_500_aa_0950', '03_500_aa_0600',
       '04_500_aa_0700'], 
      dtype='|S14')
>>> data['f5']
array([72, 77, 84, 84])

答案 2 :(得分:0)

我可以重现这一点。但是,如果您的更改dtype为float,我会

[[             nan              nan              nan              nan]
 [  9.90000000e+02   9.90000000e+02   9.90000000e+02   9.90000000e+02]
 [  9.90000000e+02   9.90000000e+02   9.90000000e+02   9.90000000e+02]
 [  1.12500000e+02   1.12500000e+02   1.12500000e+02   1.12500000e+02]
 [  2.00000000e-01   2.00000000e-01   1.80000000e-01   1.80000000e-01]
 [  7.20000000e+01   7.70000000e+01   8.40000000e+01   8.40000000e+01]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 ...

我从this mailing list question得到了这个想法。

查看here给出的答案。 np.genfromtxt() returns data of the type ndarray.这个 不能是异质的。

答案 3 :(得分:0)

我发布了自己的答案,因为这是我最终使用的。

import numpy as np

# Load data from file.
data = np.genfromtxt('data_input', dtype=None)

# Force transpose list.
data = zip(*data)

这实际上有效,而且很容易理解和使用。