Question

我的问题类似于this;我尝试使用genfromtxt，但仍然没有用。按预期读取文件，但不作为浮点数读取。代码和文件摘录如下

     temp = np.genfromtxt('PFRP_12.csv', names=True, skip_header=1, comments="#", delimiter=",", dtype=None)

读为(b'"0"', b'"0.2241135"', b'"0"', b'"0.01245075"', b'"0"', b'"0"')

     "1 _ 1",,,,,
     "Time","Force","Stroke","Stress","Strain","Disp."
     #"sec","N","mm","MPa","%","mm"
     "0","0.2241135","0","0.01245075","0","0"
     "0.1","0.2304713","0.0016","0.01280396","0.001066667","0.0016"
     "0.2","1.707077","0.004675","0.09483761","0.003116667","0.004675"

我试过不同的dtypes（none，str，float，byte），仍然没有成功。谢谢！

编辑：正如Evert提到的那样，我尝试了浮动但是将它们全部读为(nan, nan, nan, nan, nan, nan)

Answer 1

另一种解决方案是使用converters参数：

np.genfromtxt('inp.txt', names=True, skip_header=1, comments="#", 
delimiter=",", dtype=None, 
converters=dict((i, lambda s: float(s.decode().strip('"'))) for i in range(6)))

（您需要为每列指定转换器）。

侧面评论奇怪的是，虽然dtype="U12"或类似的实际应该生成字符串而不是字节（避免.decode()部分），但这似乎不起作用，并导致空条目。

Answer 2

这是一种花哨，难以理解，功能强大的编程风格方式，可将您的输入转换为您正在寻找的记录数组：

>>> np.core.records.fromarrays(np.asarray([float(y.decode().strip('"')) for x in temp for y in x]).reshape(-1, temp.shape[0]), names=temp.dtype.names, formats=['f'] * len(temp.dtype.names))

或分散几行：

>>> np.core.records.fromarrays(
...   np.asarray(
...     [float(y.decode().strip('"')) for x in temp for y in x]
...   ).reshape(-1, temp.shape[0]), 
...   names=temp.dtype.names, 
...   formats=['f'] * len(temp.dtype.names))

我不会推荐这个解决方案，但有时候这样做会很有趣。

您的数据问题比看上去要复杂一些。这是因为CSV文件中的数字确实是而不是数字：它们是明确的字符串，因为它们包含双引号。

因此，转换为浮动涉及3个步骤： - 解码字节到Python 3（unicode）字符串 - 从每个字符串的每一端删除（剥离）双引号 - 将剩余的字符串转换为float

这发生在第3行的双列表理解中。它是双列表理解，因为rec-array本质上是2D。
结果列表是1D。我将它变回一个numpy数组（np.asarray），这样我就可以轻松地重塑为2D。然后将该（现在是普通浮点数）数组赋予np.core.records.fromarrays，其名称取自原始rec数组，并为每个字段设置浮动格式。

如何在python3中将numpy字节转换为float？

2 个答案: