我有两个这样的dask.dataframe:
df
Start End Val
0 1 10 a
1 11 15 b
2 16 25 c
3 26 27 a
df_2 = pd.DataFrame([2],[12],[15],[23]], columns = ['Time'])
df_2
Time
0 2
1 12
2 15
3 23
并且如果df_2
在df_1['Val']
和df_2['Time']
(含)之间,则要使用df_1['Start']
中的值向df_2['End']
添加新列。结果df_2
将是:
df_2
Time Value
0 2 a
1 12 b
2 15 b
3 23 c
我发现,如果将pandas.DataFrame放在其中,将如here所述那样简单,但是在应用它时出现错误:
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py", line 3706, in set_index
**kwargs
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/shuffle.py", line 66, in set_index
index2 = df[index]
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py", line 3497, in __getitem__
meta = self._meta[_extract_meta(key)]
File "/usr/local/lib64/python3.7/site-packages/pandas/core/frame.py", line 2806, in __getitem__
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "/usr/local/lib64/python3.7/site-packages/pandas/core/indexing.py", line 1552, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "/usr/local/lib64/python3.7/site-packages/pandas/core/indexing.py", line 1639, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [IntervalIndex()] are in the [columns]"
您知道使用Dask有效执行操作的替代方法吗?