我想知道是否有可能在Sparklyr(或dplyr)中做到这一点,而无需 使用循环: 对于输入的Spark数据帧,通过指示列名称来获取每一列的频率。
这是输入小标题:
> df=data.frame(customer=c("TIM","TAM","TIM"),
product=c("Banana","Apple","Orange"))
> df=sdf_copy_to(sc,df,"df",overwrite = TRUE)
> df
# Source: spark<df> [?? x 2]
customer product
* <chr> <chr>
1 TIM Banana
2 TAM Apple
3 TIM Orange
我正在寻找的结果:
> result
# Source: spark<?> [?? x 3]
# Groups: name
name value freq
* <chr> <chr> <dbl>
1 product Apple 1
2 product Orange 1
3 customer TIM 2
4 product Banana 1
5 customer TAM 1
谢谢!