我有以下dask数据框:
我要将sklearn缩放器E.G应用于LotArea列:
scaler = StandardScaler()
scaler.fit_transform(df[['LotArea']])
这将返回一个numpy数组:
array([[ 0.82160041],
[ 1.59216945],
[ 1.46485804],
[-0.11648362],
[-1.10613315],
[ 0.34906243],
[-0.23942507],
[-0.11648362],
[ 0.40033659],
[-0.11706628],
[-0.85762828],
[-2.07480689]])
但是我不能将数据框更新为:
df[column] = (scaler.fit_transform(df[[column]]))
它返回以下错误:
TypeError: Column assignment doesn't support type numpy.ndarray
我尝试将其转换为dask数组,但结果相同:
df['LotArea'] = da.from_array(scaler.fit_transform(df[[column]]))
TypeError: Column assignment doesn't support type dask.array.core.Array
如何使用定标器更新数据框?
答案 0 :(得分:1)
这归结为“如何将列添加到Dask DataFrame”。
In [22]: df = pd.DataFrame({"A": [1, 2, 3, 4]})
In [23]: ddf = dd.from_pandas(df, 2)
In [24]: b = da.from_array(np.array([1, 2, 3, 4]), chunks=2)
In [25]: ddf['B'] = dd.from_dask_array(b, index=ddf.index)
In [26]: ddf.head()
/Users/taugspurger/sandbox/dask/dask/dataframe/core.py:5724: UserWarning: Insufficient elements for `head`. 5 elements requested, only
2 elements available. Try passing larger `npartitions` to `head`.
warnings.warn(msg.format(n, len(r)))
Out[26]:
A B
0 1 1
1 2 2
在Dask中,这可能会变得更容易。参见https://github.com/dask/dask/issues/5118。