我正在寻找一种方法来提取数据组“ mydf”中因子格式的多个组(“季节”,“ meteo”)的模式(“ meteo2”)。这是我的测试代码,如下所示,但是它不起作用,并导致错误消息。与一组“季节”一起工作。三列均具有“ NA”值。我不确定代码中哪一部分是错误的。任何帮助都非常欢迎!
str(mydf$season)
Factor w/ 4 levels "Spring","Summer",...:
str(mydf$meteo)
Factor w/ 7 levels "<40","<50","<60",..:
str(mydf$meteo2)
Factor w/ 4 levels "E","N","S","W":
# mode function
Mode = function(x){
ta = table(x)
tam = max(ta)
if (all(ta == tam))
mod = NA
else
if(is.numeric(x))
mod = as.numeric(names(ta)[ta == tam])
else
mod = names(ta)[ta == tam]
return(mod)}
# extracting mode
dataSummary<-mydf %>% select(season, meteo, meteo2) %>%
mutate(meteo = forcats::fct_explicit_na(meteo)) %>%
group_by(meteo, season) %>%
summarise(m=Mode(meteo2))
dataSummary
error : Column `m` can't promote group 30 to character
这是我的示例数据。
dput(head(mydf_sample))
structure(list(season = structure(c(3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Spring",
"Summer", "Fall", "Winter"), class = "factor"), meteo2 = structure(c(2L,
2L, 2L, 1L, 2L, 2L), .Label = c("E", "N", "S", "W"), class = "factor"),
meteo = structure(c(6L, 6L, 6L, 6L, 7L, 7L), .Label = c("<40",
"<50", "<60", "<70", "<75", "<80", "80+"), class = "factor")), .Names = c("season",
"meteo2", "meteo"), row.names = c(NA, 6L), class = "data.frame")
>
答案 0 :(得分:1)
您的错误未随示例数据一起复制。
但是,如果您的目标是产生模式,则可以通过计算组合并采用最常见的组合来更直接地实现。
mydf %>%
mutate(meteo = forcats::fct_explicit_na(meteo)) %>%
count(meteo, season, meteo2) %>%
arrange(desc(n)) %>%
distinct(meteo, season, .keep_all = TRUE) %>%
select(-n)
呼叫distinct将采用它看到的第一个选项,这是最常见的,因为从arrange开始降序。
在平局的情况下,这只会选择选项之一。如果这是一个问题,您可以进行一些调整来选择所有内容。
mydf %>%
mutate(meteo = forcats::fct_explicit_na(meteo)) %>%
count(meteo, season, meteo2) %>%
group_by(meteo, season) %>%
filter(n == max(n)) %>%
ungroup() %>%
select(-n)
答案 1 :(得分:1)
根据错误消息,似乎某些组未返回字符值(可能是NA
,属于逻辑类)。您可以使用as.character
明确地将它们转换为字符。
library(dplyr)
mydf_sample %>% group_by(meteo,season) %>% summarise(m=as.character(Mode(meteo2)))