每当我将purrr库函数pmap()
与包含具有多个条件的if语句的函数一起使用时,if语句似乎无法正常工作。为了向您展示我的意思,这是一个使用gapminder数据集的可重现示例。
library(tidyverse)
library(gapminder)
library(broom)
# Nest the tibble into separate dataframes for each country-continent combination
by_country <- gapminder %>%
group_by(country, continent) %>%
nest()
现在我想为每个分组的数据帧构建一个线性回归模型。问题是我想在我的模型中使用不同的x变量,具体取决于国家和大陆。这是我的函数,我怀疑if语句有问题:
# My function
country_model <- function(df, cont, count) {
if(cont == "Asia" & count == "Afghanistan") { # 2 conditions
lm(lifeExp ~ year, data = df)
} else {
lm(lifeExp ~ pop, data = df)
}
}
现在我将采用该功能并将其应用于所有分组的数据帧。我的期望是模型摘要输出将显示阿富汗数据集的模型将具有year
而不是pop
的系数。
by_country2 <- by_country %>%
mutate(model = pmap(list(data, continent, country), country_model),
modelsum = map(model, tidy)) %>%
unnest(modelsum, .drop = TRUE)
by_country2
我的输出显示阿富汗系数为pop
,而不是year
。
A tibble: 284 × 7
country continent term estimate std.error statistic p.value
<fctr> <fctr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan Asia (Intercept) 2.834615e+01 2.314395e+00 12.247758 2.410050e-07
2 Afghanistan Asia pop 5.771517e-07 1.343425e-07 4.296121 1.570999e-03
3 Albania Europe (Intercept) 4.963274e+01 1.935933e+00 25.637630 1.871817e-10
4 Albania Europe pop 7.286188e-06 7.171585e-07 10.159802 1.374311e-06
5 Algeria Africa (Intercept) 3.565187e+01 1.632853e+00 21.834099 9.087006e-10
6 Algeria Africa pop 1.176242e-06 7.588190e-08 15.500960 2.548769e-08
7 Angola Africa (Intercept) 2.855043e+01 1.922225e+00 14.852803 3.843692e-08
8 Angola Africa pop 1.276860e-06 2.482137e-07 5.144195 4.351004e-04
9 Argentina Americas (Intercept) 5.323586e+01 3.784907e-01 140.653008 8.102227e-18
10 Argentina Americas pop 5.532629e-07 1.282987e-08 43.123018 1.079775e-12
# ... with 274 more rows
对我来说奇怪的是,当我在我的函数if语句中只使用1个条件时,它似乎完美无缺:
country_model <- function(df, cont) {
if(cont == "Asia") { # Only 1 condition
lm(lifeExp ~ year, data = df)
} else {
lm(lifeExp ~ pop, data = df)
}
}
by_country2 <- by_country %>%
mutate(model = map2(data, continent, country_model),
modelsum = map(model, tidy)) %>%
unnest(modelsum, .drop = TRUE)
by_country2
# A tibble: 284 × 7
country continent term estimate std.error statistic p.value
<fctr> <fctr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan Asia (Intercept) -5.075343e+02 4.048416e+01 -12.536613 1.934055e-07
2 Afghanistan Asia year 2.753287e-01 2.045093e-02 13.462890 9.835213e-08
3 Albania Europe (Intercept) 4.963274e+01 1.935933e+00 25.637630 1.871817e-10
4 Albania Europe pop 7.286188e-06 7.171585e-07 10.159802 1.374311e-06
5 Algeria Africa (Intercept) 3.565187e+01 1.632853e+00 21.834099 9.087006e-10
6 Algeria Africa pop 1.176242e-06 7.588190e-08 15.500960 2.548769e-08
7 Angola Africa (Intercept) 2.855043e+01 1.922225e+00 14.852803 3.843692e-08
8 Angola Africa pop 1.276860e-06 2.482137e-07 5.144195 4.351004e-04
9 Argentina Americas (Intercept) 5.323586e+01 3.784907e-01 140.653008 8.102227e-18
10 Argentina Americas pop 5.532629e-07 1.282987e-08 43.123018 1.079775e-12
# ... with 274 more rows
我不确定我的问题是pmap()
还是我的if语句。
答案 0 :(得分:1)
似乎pmap
通过continent
和country
作为数字发送,可以通过在函数中放置一个print语句来确认。
test_fun <- function(df, cont, xx) {
print(paste(cont, xx))
}
temp <-by_country %>%
mutate(model = pmap(list(data, continent, country), test_fun))
打印:
[1] "3 1" [1] "4 2" [1] "1 3" [1] "1 4" [1] "2 5" [1] "5 6" [1] "4 7" [1] "3 8" [1] "3 9" etc
这不会发生在map2
,因此您的第二次尝试确实有效。
胁迫角色解决了这个问题:
by_country %>%
mutate(model = pmap(list(data, as.character(continent), as.character(country)), country_model),
modelsum = map(model, broom::tidy)) %>%
unnest(modelsum, .drop = TRUE)
# A tibble: 284 x 7 country continent term estimate std.error statistic p.value <fctr> <fctr> <chr> <dbl> <dbl> <dbl> <dbl> 1 Afghanistan Asia (Intercept) -5.075343e+02 4.048416e+01 -12.536613 1.934055e-07 2 Afghanistan Asia year 2.753287e-01 2.045093e-02 13.462890 9.835213e-08 3 Albania Europe (Intercept) 4.963274e+01 1.935933e+00 25.637630 1.871817e-10 4 Albania Europe pop 7.286188e-06 7.171585e-07 10.159802 1.374311e-06 5 Algeria Africa (Intercept) 3.565187e+01 1.632853e+00 21.834099 9.087006e-10 6 Algeria Africa pop 1.176242e-06 7.588190e-08 15.500960 2.548769e-08 7 Angola Africa (Intercept) 2.855043e+01 1.922225e+00 14.852803 3.843692e-08 8 Angola Africa pop 1.276860e-06 2.482137e-07 5.144195 4.351004e-04 9 Argentina Americas (Intercept) 5.323586e+01 3.784907e-01 140.653008 8.102227e-18 10 Argentina Americas pop 5.532629e-07 1.282987e-08 43.123018 1.079775e-12 # ... with 274 more rows