我有一个2d矩阵的3d矩阵。但是它们的大小都相同。 它们的第二维随每个样本而增加。 因此,我想在每行上方填充NaN,以使它们都具有相同的形状。
这些是示例:
# generated by this:
arr = np.asarray(df)
result = list((map(lambda i: arr[:i], range(1,df.shape[0]+1))))
[
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11
2019-06-17 08:51:00 12504.11 NaN 12503.11 12503.11 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11
2019-06-17 08:51:00 12504.11 NaN 12503.11 12503.11
2019-06-17 08:52:00 12504.11 12504.11 12503.11 12503.11 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11
2019-06-17 08:51:00 12504.11 NaN 12503.11 12503.11
2019-06-17 08:52:00 12504.11 12504.11 12503.11 12503.11
2019-06-17 08:53:00 12503.61 12503.61 12503.61 12503.61 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11
2019-06-17 08:51:00 12504.11 NaN 12503.11 12503.11
2019-06-17 08:52:00 12504.11 12504.11 12503.11 12503.11
2019-06-17 08:53:00 12503.61 12503.61 12503.61 12503.61
2019-06-17 08:54:00 12503.61 12503.61 12503.11 12503.11 ]
]
预期结果:
[
[ NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71 ],
[ NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91 ],
[ NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21 ],
[ NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41 ],
[ NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21 ],
[ NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11 ],
[ NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11
2019-06-17 08:51:00 12504.11 NaN 12503.11 12503.11 ],
[ NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11
2019-06-17 08:51:00 12504.11 NaN 12503.11 12503.11
2019-06-17 08:52:00 12504.11 12504.11 12503.11 12503.11 ],
[ NaN NaN NaN NaN NaN
2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11
2019-06-17 08:51:00 12504.11 NaN 12503.11 12503.11
2019-06-17 08:52:00 12504.11 12504.11 12503.11 12503.11
2019-06-17 08:53:00 12503.61 12503.61 12503.61 12503.61 ],
[2019-06-17 08:45:00 12089.89 12089.89 12087.71 12087.71
2019-06-17 08:46:00 12087.91 NaN 12087.71 12087.91
2019-06-17 08:47:00 12088.21 12088.21 12084.21 12085.21
2019-06-17 08:48:00 12085.09 12090.21 12084.91 12089.41
2019-06-17 08:49:00 12089.71 12090.21 12087.21 12088.21
2019-06-17 08:50:00 12504.11 12504.11 12504.11 12504.11
2019-06-17 08:51:00 12504.11 NaN 12503.11 12503.11
2019-06-17 08:52:00 12504.11 12504.11 12503.11 12503.11
2019-06-17 08:53:00 12503.61 12503.61 12503.61 12503.61
2019-06-17 08:54:00 12503.61 12503.61 12503.11 12503.11 ]
]
什么是有效的方法? (数据大约有100.000-500.000个样本)
编辑: 否则,是否有办法立即生成“结果”和预期结果? 像创建第二个充满NaN的数据框一样?这样的东西? (伪:)
result = list((map(lambda i: nanarr[:j-i]+arr[:i], range(1,df.shape[0]+1))))
答案 0 :(得分:0)
我假设result
就是您上面粘贴的内容。
如果result
是列表列表,则可以使用以下方法修改结果以获取您上面要求的输出:
import numpy as np
longest_length = max(len(item) for item in result)
new_result = []
for L in result:
new_result.append([np.NaN] * (longest_length - len(L)) + L)
这大约与不使用编译代码所能获得的“效率”一样。
您所问的问题本身效率很低。您正在构造的输出具有N**2 * M
值,其中N是您拥有的样本数量,M是每个样本中值的数量。此问题的输出包含大量重复的数据。如果您需要一种更高效的解决方案,则可以尝试找到一种编写没有此重复代码的方法。