每个分类列的Sparklyr / Dplyr频率

时间:2019-03-07 21:49:00

标签: r apache-spark dplyr sparklyr

我想知道是否有可能在Sparklyr(或dplyr)中做到这一点,而无需 使用循环: 对于输入的Spark数据帧,通过指示列名称来获取每一列的频率。

这是输入小标题:

> df=data.frame(customer=c("TIM","TAM","TIM"),
          product=c("Banana","Apple","Orange"))
> df=sdf_copy_to(sc,df,"df",overwrite = TRUE)
> df
# Source: spark<df> [?? x 2]
customer product
* <chr>    <chr>  
1 TIM      Banana 
2 TAM      Apple  
3 TIM      Orange  

我正在寻找的结果:

> result
# Source: spark<?> [?? x 3]
# Groups: name
name     value   freq
* <chr>    <chr>  <dbl>
1 product  Apple      1
2 product  Orange     1
3 customer TIM        2
4 product  Banana     1
5 customer TAM        1

谢谢!

0 个答案:

没有答案