Question

我的 dask数据框如下：

In [65]: df.head()
Out[65]:
   id_orig  id_cliente  id_cartao  inicio_processo  fim_processo  score  \
0      1.0         1.0        1.0              1.0           1.0    1.0
1      1.0         1.0        1.0              1.0           1.0    1.0
2      1.0         1.0        1.0              1.0           1.0    1.0
3      1.0         1.0        1.0              1.0           1.0    1.0
4      1.0         1.0        1.0              1.0           1.0    1.0

   automatico  canal  aceito  motivo_recusa  variante
0         1.0    1.0     1.0            1.0       1.0
1         1.0    1.0     1.0            1.0       1.0
2         1.0    1.0     1.0            1.0       1.0
3         1.0    1.0     1.0            1.0       1.0
4         1.0    1.0     1.0            1.0       1.0

分配整数有效：

In [92]: df = df.assign(id_cliente=999)

In [93]: df.head()
Out[93]:
   id_orig  id_cliente  id_cartao  inicio_processo  fim_processo  score  \
0      1.0         999        1.0              1.0           1.0    1.0
1      1.0         999        1.0              1.0           1.0    1.0
2      1.0         999        1.0              1.0           1.0    1.0
3      1.0         999        1.0              1.0           1.0    1.0
4      1.0         999        1.0              1.0           1.0    1.0

   automatico  canal  aceito  motivo_recusa  variante
0         1.0    1.0     1.0            1.0       1.0
1         1.0    1.0     1.0            1.0       1.0
2         1.0    1.0     1.0            1.0       1.0
3         1.0    1.0     1.0            1.0       1.0
4         1.0    1.0     1.0            1.0       1.0

但是，现有列中没有其他方法可以分配Series或任何其他可迭代的方法。

我怎样才能做到这一点？

Answer 1

DataFrame.assign接受任何标量或任何dd.Series

df = df.assign(a=1)  # accepts scalars
df = df.assign(z=df.x + df.y)  # accepts dd.Series objects

如果您尝试分配NumPy数组或Python列表，那么您的数据可能足够小以适应RAM，因此Pandas可能比Dask.dataframe更合适。

您也可以使用普通的setitem语法

df['a'] = 1
df['z'] = df.x + df.y

如何为dask dataframe列分配序列或序列？

1 个答案: