SparkR库中是否有类似于melt
的函数?
将1行50列的数据转换为50行3列的数据吗?
答案 0 :(得分:1)
SparkR中没有提供类似功能的内置功能。您可以使用explode
library(magrittr)
df <- createDataFrame(data.frame(
A = c('a', 'b', 'c'),
B = c(1, 3, 5),
C = c(2, 4, 6)
))
melt <- function(df, id.vars, measure.vars,
variable.name = "key", value.name = "value") {
measure.vars.exploded <- purrr::map(
measure.vars, function(c) list(lit(c), column(c))) %>%
purrr::flatten() %>%
(function(x) do.call(create_map, x)) %>%
explode()
id.vars <- id.vars %>% purrr::map(column)
do.call(select, c(df, id.vars, measure.vars.exploded)) %>%
withColumnRenamed("key", variable.name) %>%
withColumnRenamed("value", value.name)
}
melt(df, c("A"), c("B", "C")) %>% head()
A key value
1 a B 1
2 a C 2
3 b B 3
4 b C 4
5 c B 5
6 c C 6
或在Hive的stack
UDF中使用SQL:
stack <- function(df, id.vars, measure.vars,
variable.name = "key", value.name = "value") {
measure.vars.exploded <- glue::glue('"{measure.vars}", `{measure.vars}`') %>%
glue::glue_collapse(" , ") %>%
(function(x) glue::glue(
"stack({length(measure.vars)}, {x}) as ({variable.name}, {value.name})"
)) %>%
as.character()
do.call(selectExpr, c(df, id.vars, measure.vars.exploded))
}
stack(df, c("A"), c("B", "C")) %>% head()
A key value
1 a B 1
2 a C 2
3 b B 3
4 b C 4
5 c B 5
6 c C 6
相关问题: