我试图将textfile加载到numpy数组中。
结构如下:
wait.until
但我没有使用
THE 77534223
AND 30997177
ING 30679488
ENT 17902107
ION 17769261
HER 15277018
FOR 14686159
THA 14222073
NTH 14115952
[...]
我想在第一列中使用dtype import numpy as np
data = np.genfromtxt("english_trigrams.txt", dtype=(str,int), delimiter=' ')
print(data)
[['TH' '77']
['AN' '30']
['IN' '30']
...,
['JX' '1']
['JQ' '1']
['JQ' '1']]
的(x,2)数组,在第二列中使用dtype str
。
非常感谢!
P.S:
答案 0 :(得分:0)
加载此文字的各种方法
In [470]: txt=b"""THE 77534223
...: AND 30997177
...: ING 30679488
...: ENT 17902107
...: ION 17769261
...: HER 15277018
...: FOR 14686159
...: THA 14222073
...: NTH 14115952"""
让genfromtxt
推导出正确的列dtype
In [471]: data = np.genfromtxt(txt.splitlines(),dtype=None)
In [472]: data
Out[472]:
array([(b'THE', 77534223), (b'AND', 30997177), (b'ING', 30679488),
(b'ENT', 17902107), (b'ION', 17769261), (b'HER', 15277018),
(b'FOR', 14686159), (b'THA', 14222073), (b'NTH', 14115952)],
dtype=[('f0', 'S3'), ('f1', '<i4')])
不是正确的dtype规范;和你的一样,但每个元素只有1个字符。
In [473]: data = np.genfromtxt(txt.splitlines(),dtype=(str, int))
In [474]: data
Out[474]:
array([['T', '7'],
['A', '3'],
['I', '3'],
['E', '1'],
['I', '1'],
['H', '1'],
['F', '1'],
['T', '1'],
['N', '1']],
dtype='<U1')
好一点 - 但字符串太短
In [475]: data = np.genfromtxt(txt.splitlines(),dtype='str,int')
In [476]: data
Out[476]:
array([('', 77534223), ('', 30997177), ('', 30679488), ('', 17902107),
('', 17769261), ('', 15277018), ('', 14686159), ('', 14222073),
('', 14115952)],
dtype=[('f0', '<U'), ('f1', '<i4')])
与dtype=None
案例类似
In [477]: data = np.genfromtxt(txt.splitlines(),dtype='U10,int')
In [478]: data
Out[478]:
array([('THE', 77534223), ('AND', 30997177), ('ING', 30679488),
('ENT', 17902107), ('ION', 17769261), ('HER', 15277018),
('FOR', 14686159), ('THA', 14222073), ('NTH', 14115952)],
dtype=[('f0', '<U10'), ('f1', '<i4')])