在将分区数据读入Dask DataFrame时是否可以删除某些类别?
例如,我在
中划分了镶木地板events/year=2017/month=09/day=01/hour=01/customer=a.com/xxxx.parquet
events/year=2017/month=09/day=01/hour=02/customer=a.com/xxxx.parquet
events/year=2017/month=09/day=01/hour=01/customer=a.com/xxxx.parquet
我用以下内容阅读:
df = dd.read_parquet('./events/24.100/year=*/month=*/day=*/hour=*/customer=*/*.parquet')
阅读后,hour
和customer
在我的数据中显示为类别:
Dask DataFrame Structure:
url referrer session_id ts hour customer
npartitions=24
object object object datetime64[ns] category[known] category[known]
... ... ... ... ... ...
... ... ... ... ... ...
Dask Name: read-parquet, 24 tasks
我想放弃hour
,但保留customer
。我该怎么做?