如何基于R中另一个变量的更改创建新变量

时间:2018-09-29 09:00:22

标签: r variables

我确实是一名新手,但请帮助我完成此任务。

我在“ polityScore”上有以下数据集示例,并且需要在以下条件下基于第一个变量的年度变化创建一个名为“ politicalChange”的新变量:

if polityScore in A in year1 + 1 > polityScore in A in year1---> "democratization"
if polityScore in A in year1 + 1 < polityScore in A in year1---> "autocratization"
if polityScore in A in year1 + 1 = polityScore in A in year1---> "no change"

数据:

country, date, polityScore, politicalChange

A   ,2000   ,5  ,
A   ,2001   ,6  ,
A   ,2002   ,4  ,
A   ,2003   ,5  ,
A   ,2004   ,5  ,
A   ,2005   ,7  ,
B   ,2000   ,5  ,
B   ,2001   ,6  ,
B   ,2002   ,4  ,
B   ,2003   ,5  ,
B   ,2004   ,5  ,
B   ,2005   ,7  ,

谢谢!

1 个答案:

答案 0 :(得分:0)

您可能想要类似以下的内容。 dplyr软件包可以帮助您解决此问题。首先按国家/地区分组,以便在每个国家/地区执行以下if_else语句。在if_else中,将polityScore与1年之前的polityScore进行比较,并以此为基础填写“民主化”,“专制化”或“不变”。该组的第一个值为NA。

如果您不希望使用NA,而是“不更改”,则将default = first(polityScore)添加到滞后函数。

library(dplyr)
df1 %>% 
  group_by(country) %>% 
  mutate(politicalChange = if_else(polityScore > lag(polityScore), "democratization", 
                                  ifelse(polityScore < lag(polityScore), "autocratization", "no change")))

# A tibble: 12 x 4
# Groups:   country [2]
   country  date polityScore politicalChange
   <chr>   <dbl>       <dbl> <chr>          
 1 A        2000           5 NA             
 2 A        2001           6 democratization
 3 A        2002           4 autocratization
 4 A        2003           5 democratization
 5 A        2004           5 no change      
 6 A        2005           7 democratization
 7 B        2000           5 NA             
 8 B        2001           6 democratization
 9 B        2002           4 autocratization
10 B        2003           5 democratization
11 B        2004           5 no change      
12 B        2005           7 democratization

为了便于阅读,您还可以使用case_when代替if_elsecase_when还会用TRUE规则填充NA。

df1 %>% 
  group_by(country) %>% 
  mutate(politicalChange = case_when(polityScore > lag(polityScore) ~ "democratization", 
                                     polityScore < lag(polityScore) ~ "autocratization",
                                     TRUE ~ "no change"))
# A tibble: 12 x 4
# Groups:   country [2]
   country  date polityScore politicalChange
   <chr>   <dbl>       <dbl> <chr>          
 1 A        2000           5 no change      
 2 A        2001           6 democratization
 3 A        2002           4 autocratization
.....

数据:

df1 <- structure(list(country = c("A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B"), date = c(2000, 2001, 2002, 2003, 2004, 
2005, 2000, 2001, 2002, 2003, 2004, 2005), polityScore = c(5, 
6, 4, 5, 5, 7, 5, 6, 4, 5, 5, 7), politicalChange = c(NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -12L
), class = "data.frame")

P.S。

bookdown.org上查找有关R的大量书籍,这些书籍可以进一步帮助您。