sparkR:如何从字符向量创建虚拟列?

时间:2016-10-19 15:21:45

标签: r apache-spark sparkr grepl

考虑以下简单示例:

df <- data.frame(id=c(1:4), climate=c("cold_rainy","coldSunny","rainywarm","sunny_warm"))
head(df)

       id    climate
       1     cold_rainy
       2     coldSunny
       3     rainywarm
       4     sunny_warm

我可以简单地为包含“sunny”一词的所有行创建一个虚拟对象,如下所示:

df$sunny=grepl('sunny',df$climate, ignore.case = TRUE)*1
head(df)

  id    climate        sunny
  1     cold_rainy     0
  2     coldSunny      1
  3     rainywarm      0
  4     sunny_warm     1

如何在sparkR中的SparkDataFrame上实现此操作?

1 个答案:

答案 0 :(得分:1)

您可以先将字符串值转换为小写,然后使用rlike()"sunny"中查找$climate。布尔输出我们因此cast()键入integer

ddf <- createDataFrame(sqlContext, df)  # Data
ddf$climate <- lower(ddf$climate) # Convert to lowercase
ddf$sunny <- cast(rlike(ddf$climate, "sunny"), "integer") # Create integer column

> ddf
  id    climate sunny
1  1 cold_rainy     0
2  2  coldsunny     1
3  3  rainywarm     0
4  4 sunny_warm     1