Question

我有一个庞大的文本文件，跳过标题后虚拟版本看起来像这样：

1444455        7        8        12 52 45 68 70

1356799        3        3        45 34 23 22 11

我想把它读成一个numpy数组，np.loadtxt工作得很慢。该文件的名称是data.txt。现在我正在使用：

u=pd.read_csv('data.txt',dtype=np.float16,header=3).values

我玩过参数无济于事。如果我省略了dtype，我会在我的数组中为每一行获得一个长串数字。当我插入dtype时，我得到错误：float（）的文字无效。我相信我对文本文件中的两种分隔符（制表符和单个空格）也存在一些混淆。我怎样才能把它变成一个形状凹凸不平的阵列（2,8）。

你们中的任何一位职业选手能帮忙吗？感谢

Answer 1

如果分隔符是空格且delim_whitespace=True，则read_csv中似乎需要header=None：

然后转为float：

u=pd.read_csv('data.txt', delim_whitespace=True, header=None).astype(float).values

print (u)
[[  1.44445500e+06   7.00000000e+00   8.00000000e+00   1.20000000e+01
    5.20000000e+01   4.50000000e+01   6.80000000e+01   7.00000000e+01]
 [  1.35679900e+06   3.00000000e+00   3.00000000e+00   4.50000000e+01
    3.40000000e+01   2.30000000e+01   2.20000000e+01   1.10000000e+01]]

但有numpy.float64：

u=pd.read_csv('data.txt', delim_whitespace=True, header=None).astype(float)

print (type(u.loc[0,0]))
<class 'numpy.float64'>

如果使用dtype=np.float16获取inf：

u=pd.read_csv('data.txt', dtype=np.float16, delim_whitespace=True, header=None).values
print (u)
[[ inf   7.   8.  12.  52.  45.  68.  70.]
 [ inf   3.   3.  45.  34.  23.  22.  11.]]

使用pandas reader将文本文件读入numpy数组的问题

1 个答案: