Question

我有一个曲线的DataFrame，其中每个曲线都表示为x，y对的DataFrame。看起来像这样：

    a      b      c      d
0 curve  curve  curve  curve
1 curve  curve  curve  curve
2 curve  curve  curve  curve
3 curve  curve  curve  curve
...

每个“曲线”都是代表曲线的简单X vs Y数据框

  |  x | y
----------
0 | -1 | 0
1 | 0  | 0
2 | 0  | 0.02
3 | 1  | 0.02
...

在整个DataFrame中获取唯一值“ y”的列表的最有效方法是什么。这是我的第一遍，并且有效，但是我怀疑这是最有效的。

flattened_array = np.concatenate([np.concatenate([curve.y for curve in curves]) for name, curves in df_of_curves.iteritems()])
unique_values = np.unique(flattened_array)

在此过程中，我还尝试对每个系列使用.unique（），但这似乎并不影响性能：

flattened_array = np.concatenate([np.concatenate([curve.y.unique() for curve in curves]) for name, curves in df_of_curves.iteritems()])
unique_values = np.unique(flattened_array)

在DataFrames的DataFrame中查找唯一元素

0 个答案: