我确实是一名新手,但请帮助我完成此任务。
我在“ polityScore”上有以下数据集示例,并且需要在以下条件下基于第一个变量的年度变化创建一个名为“ politicalChange”的新变量:
if polityScore in A in year1 + 1 > polityScore in A in year1---> "democratization"
if polityScore in A in year1 + 1 < polityScore in A in year1---> "autocratization"
if polityScore in A in year1 + 1 = polityScore in A in year1---> "no change"
数据:
country, date, polityScore, politicalChange
A ,2000 ,5 ,
A ,2001 ,6 ,
A ,2002 ,4 ,
A ,2003 ,5 ,
A ,2004 ,5 ,
A ,2005 ,7 ,
B ,2000 ,5 ,
B ,2001 ,6 ,
B ,2002 ,4 ,
B ,2003 ,5 ,
B ,2004 ,5 ,
B ,2005 ,7 ,
谢谢!
答案 0 :(得分:0)
您可能想要类似以下的内容。 dplyr软件包可以帮助您解决此问题。首先按国家/地区分组,以便在每个国家/地区执行以下if_else
语句。在if_else中,将polityScore与1年之前的polityScore进行比较,并以此为基础填写“民主化”,“专制化”或“不变”。该组的第一个值为NA。
如果您不希望使用NA,而是“不更改”,则将default = first(polityScore)
添加到滞后函数。
library(dplyr)
df1 %>%
group_by(country) %>%
mutate(politicalChange = if_else(polityScore > lag(polityScore), "democratization",
ifelse(polityScore < lag(polityScore), "autocratization", "no change")))
# A tibble: 12 x 4
# Groups: country [2]
country date polityScore politicalChange
<chr> <dbl> <dbl> <chr>
1 A 2000 5 NA
2 A 2001 6 democratization
3 A 2002 4 autocratization
4 A 2003 5 democratization
5 A 2004 5 no change
6 A 2005 7 democratization
7 B 2000 5 NA
8 B 2001 6 democratization
9 B 2002 4 autocratization
10 B 2003 5 democratization
11 B 2004 5 no change
12 B 2005 7 democratization
为了便于阅读,您还可以使用case_when
代替if_else
。 case_when
还会用TRUE规则填充NA。
df1 %>%
group_by(country) %>%
mutate(politicalChange = case_when(polityScore > lag(polityScore) ~ "democratization",
polityScore < lag(polityScore) ~ "autocratization",
TRUE ~ "no change"))
# A tibble: 12 x 4
# Groups: country [2]
country date polityScore politicalChange
<chr> <dbl> <dbl> <chr>
1 A 2000 5 no change
2 A 2001 6 democratization
3 A 2002 4 autocratization
.....
数据:
df1 <- structure(list(country = c("A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), date = c(2000, 2001, 2002, 2003, 2004,
2005, 2000, 2001, 2002, 2003, 2004, 2005), polityScore = c(5,
6, 4, 5, 5, 7, 5, 6, 4, 5, 5, 7), politicalChange = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -12L
), class = "data.frame")
P.S。
在bookdown.org上查找有关R的大量书籍,这些书籍可以进一步帮助您。