以下是我的代码MWE
:
import numpy as np
# Load data from file.
data = np.genfromtxt('data_input', dtype=None, unpack=True)
print data
以下是data_input
文件的示例:
01_500_aa_1000 990.0 990.0 112.5 0.2 72 0 0 1 0 0 0 0 0 0 0 0 0 1
02_500_aa_0950 990.0 990.0 112.5 0.2 77 0 0 1 0 0 0 0 0 0 0 0 0 1
03_500_aa_0600 990.0 990.0 112.5 0.18 84 0 0 1 0 0 0 0 0 0 0 0 0 1
04_500_aa_0700 990.0 990.0 112.5 0.18 84 0 0 1 0 0 0 0 0 0 0 0 0 1
unpack
参数似乎不起作用,因为它始终打印:
[ ('01_500_aa_1000', 990.0, 990.0, 112.5, 0.2, 72, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
('02_500_aa_0950', 990.0, 990.0, 112.5, 0.2, 77, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
('03_500_aa_0600', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
('04_500_aa_0700', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)]
有人可以复制这个吗?我做错了什么?
答案 0 :(得分:2)
您收到此消息是因为genfromtxt
正在返回numpy record array,而不是list
。只是当你print()
到控制台时它看起来像list
。
from cStringIO import StringIO
raw = """01_500_aa_1000 990.0 990.0 112.5 0.2 72 0 0 1 0 0 0 0 0 0 0 0 0 1
02_500_aa_0950 990.0 990.0 112.5 0.2 77 0 0 1 0 0 0 0 0 0 0 0 0 1
03_500_aa_0600 990.0 990.0 112.5 0.18 84 0 0 1 0 0 0 0 0 0 0 0 0 1
04_500_aa_0700 990.0 990.0 112.5 0.18 84 0 0 1 0 0 0 0 0 0 0 0 0 1"""
sio = StringIO(raw)
data = genfromtxt(sio, dtype=None, unpack=False)
print data
print
print data.dtype
给出:
[ ('01_500_aa_1000', 990.0, 990.0, 112.5, 0.2, 72, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
('02_500_aa_0950', 990.0, 990.0, 112.5, 0.2, 77, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
('03_500_aa_0600', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
('04_500_aa_0700', 990.0, 990.0, 112.5, 0.18, 84, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)]
[('f0', 'S14'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<i8'), ('f6', '<i8'), ('f7', '<i8'), ('f8', '<i8'), ('f9', '<i8'), ('f10', '<i8'), ('f11', '<i8'), ('f12', '<i8'), ('f13', '<i8'), ('f14', '<i8'), ('f15', '<i8'), ('f16', '<i8'), ('f17', '<i8'), ('f18', '<i8')]
unpack=True
和unpack=False
似乎返回相同的内容,因为您需要recarray
。我建议您尝试pandas
并完全忘记recarray
。您可以将recarray
传递给pandas.DataFrame
,然后完成!例如,
df = DataFrame(data)
print df
print
print df.f0
的产率:
f0 f1 f2 f3 f4 f5 f6 f7 f8 \
0 01_500_aa_1000 990.00 990.00 112.50 0.20 72 0 0 1
1 02_500_aa_0950 990.00 990.00 112.50 0.20 77 0 0 1
2 03_500_aa_0600 990.00 990.00 112.50 0.18 84 0 0 1
3 04_500_aa_0700 990.00 990.00 112.50 0.18 84 0 0 1
f9 f10 f11 f12 f13 f14 f15 f16 f17 f18
0 0 0 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 0 0 1
2 0 0 0 0 0 0 0 0 0 1
3 0 0 0 0 0 0 0 0 0 1
0 01_500_aa_1000
1 02_500_aa_0950
2 03_500_aa_0600
3 04_500_aa_0700
Name: f0, dtype: object
答案 1 :(得分:2)
正如@Phillip Cloud所提到的,当你混合使用数据类型(字符串和数字)时,你会得到一个重新组合 - 第0列中的字符串会导致这种情况。
你可以通过单独导入第0列来解决这个问题:
>>> np.genfromtxt('data_input', usecols=range(1,18))
array([[ 9.90000000e+02, 9.90000000e+02, 1.12500000e+02,
2.00000000e-01, 7.20000000e+01, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00],
[ 9.90000000e+02, 9.90000000e+02, 1.12500000e+02,
2.00000000e-01, 7.70000000e+01, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00],
[ 9.90000000e+02, 9.90000000e+02, 1.12500000e+02,
1.80000000e-01, 8.40000000e+01, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00],
[ 9.90000000e+02, 9.90000000e+02, 1.12500000e+02,
1.80000000e-01, 8.40000000e+01, 0.00000000e+00,
0.00000000e+00, 1.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00]])
>>> np.genfromtxt('data_input', usecols=0,dtype=None)
array(['01_500_aa_1000', '02_500_aa_0950', '03_500_aa_0600',
'04_500_aa_0700'],
dtype='|S14')
或者,您可以像这样引用重新排列中的列:
>>> data['f0']
array(['01_500_aa_1000', '02_500_aa_0950', '03_500_aa_0600',
'04_500_aa_0700'],
dtype='|S14')
>>> data['f5']
array([72, 77, 84, 84])
答案 2 :(得分:0)
我可以重现这一点。但是,如果您的更改dtype为float,我会
[[ nan nan nan nan]
[ 9.90000000e+02 9.90000000e+02 9.90000000e+02 9.90000000e+02]
[ 9.90000000e+02 9.90000000e+02 9.90000000e+02 9.90000000e+02]
[ 1.12500000e+02 1.12500000e+02 1.12500000e+02 1.12500000e+02]
[ 2.00000000e-01 2.00000000e-01 1.80000000e-01 1.80000000e-01]
[ 7.20000000e+01 7.70000000e+01 8.40000000e+01 8.40000000e+01]
[ 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
...
我从this mailing list question得到了这个想法。
查看here给出的答案。 np.genfromtxt() returns data of the type ndarray.这个 不能是异质的。
答案 3 :(得分:0)
我发布了自己的答案,因为这是我最终使用的。
import numpy as np
# Load data from file.
data = np.genfromtxt('data_input', dtype=None)
# Force transpose list.
data = zip(*data)
这实际上有效,而且很容易理解和使用。