我想基于列值过滤cuDF数据帧,然后根据指定的条件创建一个新列。基本上,如何在cuDF中应用以下内容?
df.loc[df.column_name condition, 'new column name'] = 'value if condition is met'
答案 0 :(得分:0)
# value to be replaced in series
value = 'value if condition is met'
# condition to qualify for replacement
mask = df.column_name condition
# https://docs.rapids.ai/api/cudf/stable/
df['new column name'] = df.masked_assign(value, mask)
"""explanation:
>> if there is no pool, pool_sqft should be 0
"""
# value to be replaced in series
value = 0
# condition to qualify for replacement
mask = df_train['pool_count']==0
# https://docs.rapids.ai/api/cudf/stable/
df['pool_sqft'] = df.masked_assign(value, mask)
答案 1 :(得分:0)
虽然masked_assign
在某些条件下有效,但是applymap
是syntactically better and functionally similar to the Pandas API。
此外,@ ashwin-srinath提到__setitem()__
即将发布0.9版,因此您只能执行df[condition] = value
。 masked_assign
不是熊猫API函数,因此__setitem()__
可能只赞成masked_assign
。
答案 2 :(得分:0)
您也可以使用.query()
示例:
expr = "(a == 2) or (b == 3)"
filtered_df = df.query(expr)
其中a
和b
是数据框中列的名称。