如何在GPU DataFramecuDF中使用条件过滤数据框架?

时间:2019-07-27 00:44:56

标签: rapids cudf

我想基于列值过滤cuDF数据帧,然后根据指定的条件创建一个新列。基本上,如何在cuDF中应用以下内容?

df.loc[df.column_name condition, 'new column name'] = 'value if condition is met'

3 个答案:

答案 0 :(得分:0)

在cuDF中赋予熊猫

# value to be replaced in series 
value = 'value if condition is met'
# condition to qualify for replacement
mask = df.column_name condition

# https://docs.rapids.ai/api/cudf/stable/
df['new column name'] = df.masked_assign(value, mask)

应用示例

"""explanation: 
  >> if there is no pool, pool_sqft should be 0
"""

# value to be replaced in series 
value = 0
# condition to qualify for replacement
mask = df_train['pool_count']==0

# https://docs.rapids.ai/api/cudf/stable/
df['pool_sqft'] = df.masked_assign(value, mask)

答案 1 :(得分:0)

虽然masked_assign在某些条件下有效,但是applymapsyntactically better and functionally similar to the Pandas API

此外,@ ashwin-srinath提到__setitem()__即将发布0.9版,因此您只能执行df[condition] = valuemasked_assign不是熊猫API函数,因此__setitem()__可能只赞成masked_assign

答案 2 :(得分:0)

您也可以使用.query()

示例:

expr = "(a == 2) or (b == 3)"
filtered_df = df.query(expr)

其中ab是数据框中列的名称。