取决于间隔,进行Dask合并数据帧

时间:2020-06-05 12:55:03

标签: python dataframe dask dask-dataframe

我有两个这样的dask.dataframe:

df
   Start  End Val
0      1   10   a
1     11   15   b
2     16   25   c
3     26   27   a

df_2 = pd.DataFrame([2],[12],[15],[23]], columns = ['Time'])
df_2
   Time
0     2
1    12
2    15
3    23

并且如果df_2df_1['Val']df_2['Time'](含)之间,则要使用df_1['Start']中的值向df_2['End']添加新列。结果df_2将是:

df_2
   Time Value
0     2     a
1    12     b
2    15     b
3    23     c

我发现,如果将pandas.DataFrame放在其中,将如here所述那样简单,但是在应用它时出现错误:

File "/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py", line 3706, in set_index
    **kwargs
  File "/usr/local/lib/python3.7/site-packages/dask/dataframe/shuffle.py", line 66, in set_index
    index2 = df[index]
  File "/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py", line 3497, in __getitem__
    meta = self._meta[_extract_meta(key)]
  File "/usr/local/lib64/python3.7/site-packages/pandas/core/frame.py", line 2806, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
  File "/usr/local/lib64/python3.7/site-packages/pandas/core/indexing.py", line 1552, in _get_listlike_indexer
    keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
  File "/usr/local/lib64/python3.7/site-packages/pandas/core/indexing.py", line 1639, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [IntervalIndex()] are in the [columns]"

您知道使用Dask有效执行操作的替代方法吗?

0 个答案:

没有答案