我正在使用CSES(选举系统比较研究)的数据来评估选民与政党之间的意识形态距离。
我使用了此处提供的case_when命令:Changing row names in a data_frame from letters to numbers in R
它对于某些变量非常有效,但是现在我尝试将相同的代码与相似的变量(所有变量都使用数字)结合使用,并产生以下错误: mutate_impl(.data,点)中的错误:
评估错误:情况6(ex_ideolparty_F)的RHS必须为double类型,而不是整数。
我正在使用的数据在此处提供:http://www.cses.org/datacenter/imd/data/cses_imd_r.zip
在使用case_when之前,我仅对其进行了一些转换。这是我在错误发生之前运行的确切代码:
library(dplyr)
library(descr)
load("/cses_imd.rdata")
##### DATA CLEANING/RENAMING #####
cses <- cses_imd %>%
rename (election = IMD1004, country = IMD1006_NAM, type = IMD1009, age = IMD2001_1, gender = IMD2002,
education = IMD2003, income = IMD2006, party =IMD3005_3, party_int = IMD3005_4, ideol_self = IMD3006,
turnout = IMD5006_1, turnout_VAP = IMD5006_2, compulsory = IMD5007) %>%
select(election, country, type, age, gender, education, income, starts_with("IMD3002"), starts_with ("IMD3004"),
party, party_int, ideol_self, starts_with("IMD3007"), turnout, turnout_VAP, compulsory,
starts_with("IMD500"), starts_with("IMD501"))
### MORE RENAMING:
names (cses) <- gsub("IMD3002", "vote", names(cses))
names (cses) <- gsub("IMD3004", "prevote", names(cses))
names (cses) <- gsub("IMD3007", "ideolparty", names(cses))
names (cses) <- gsub("IMD5000", "numparty", names(cses))
names (cses) <- gsub("IMD5012", "ex_ideolparty", names(cses))
names (cses) <- gsub("IMD5013", "formula_house", names(cses))
names (cses) <- gsub("IMD5014", "formula_pres", names(cses))
cses$year <- as.numeric(substr(cses$election, 5, 8))
###### PERCEIVED IDEOLOGY OF THE PARTY VOTED #####
cses <- cses %>% mutate (
ideol_voted_PR1 = case_when(
numparty_A == vote_PR_1 ~ ideolparty_A,
numparty_B == vote_PR_1 ~ ideolparty_B,
numparty_C == vote_PR_1 ~ ideolparty_C,
numparty_D == vote_PR_1 ~ ideolparty_D,
numparty_E == vote_PR_1 ~ ideolparty_E,
numparty_F == vote_PR_1 ~ ideolparty_F,
numparty_G == vote_PR_1 ~ ideolparty_G,
numparty_H == vote_PR_1 ~ ideolparty_H,
numparty_I == vote_PR_1 ~ ideolparty_I,
TRUE ~ vote_PR_1
)
)
这是发生问题的地方:
##### PERCEIVED IDEOLOGY OF PARTY VOTED (EXPERT PLACEMENT):
cses <- cses %>% mutate (
ideol_ex_PR1 = case_when(
numparty_A == vote_PR_1 ~ ex_ideolparty_A,
numparty_B == vote_PR_1 ~ ex_ideolparty_B,
numparty_C == vote_PR_1 ~ ex_ideolparty_C,
numparty_D == vote_PR_1 ~ ex_ideolparty_D,
numparty_E == vote_PR_1 ~ ex_ideolparty_E,
numparty_F == vote_PR_1 ~ ex_ideolparty_F,
numparty_G == vote_PR_1 ~ ex_ideolparty_G,
numparty_H == vote_PR_1 ~ ex_ideolparty_H,
numparty_I == vote_PR_1 ~ ex_ideolparty_I,
TRUE ~ vote_PR_1
)
)
为什么会这样?我已经检查了这里使用的所有列,情况6“ ex_ideolparty_F”与其他情况没有什么不同,即使是第一次使用case_when的情况也没有什么不同,这种情况效果很好。所有这些列都是数字,而不是双精度。
答案 0 :(得分:1)
类似于if_else
,所有返回的值都必须是同一类型,因此numeric
与integer
是不同的。
如果查看数据,就会发现差异:
str(cses[,c("ex_ideolparty_A", "ex_ideolparty_B", "ex_ideolparty_C", "ex_ideolparty_D", "ex_ideolparty_E", "ex_ideolparty_F", "ex_ideolparty_G", "ex_ideolparty_H", "ex_ideolparty_I", "vote_PR_1")])
# 'data.frame': 281083 obs. of 10 variables:
# $ ex_ideolparty_A: num 6 6 6 6 6 6 6 6 6 6 ...
# $ ex_ideolparty_B: num 5 5 5 5 5 5 5 5 5 5 ...
# $ ex_ideolparty_C: num 7 7 7 7 7 7 7 7 7 7 ...
# $ ex_ideolparty_D: num 4 4 4 4 4 4 4 4 4 4 ...
# $ ex_ideolparty_E: num 4 4 4 4 4 4 4 4 4 4 ...
# $ ex_ideolparty_F: int 5 5 5 5 5 5 5 5 5 5 ...
# $ ex_ideolparty_G: int 5 5 5 5 5 5 5 5 5 5 ...
# $ ex_ideolparty_H: int 4 4 4 4 4 4 4 4 4 4 ...
# $ ex_ideolparty_I: int 5 5 5 5 5 5 5 5 5 5 ...
# $ vote_PR_1 : int 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 ...
根据您的数据,如果所有数据都是整数,则可以使用以下方法进行修复:
cses <- cses %>%
mutate_at(vars(ex_ideolparty_A, ex_ideolparty_B, ex_ideolparty_C, ex_ideolparty_D, ex_ideolparty_E, ex_ideolparty_F, ex_ideolparty_G, ex_ideolparty_H, ex_ideolparty_I, vote_PR_1),
as.integer)
str(cses[,c("ex_ideolparty_A", "ex_ideolparty_B", "ex_ideolparty_C", "ex_ideolparty_D", "ex_ideolparty_E", "ex_ideolparty_F", "ex_ideolparty_G", "ex_ideolparty_H", "ex_ideolparty_I", "vote_PR_1")])
# 'data.frame': 281083 obs. of 10 variables:
# $ ex_ideolparty_A: int 6 6 6 6 6 6 6 6 6 6 ...
# $ ex_ideolparty_B: int 5 5 5 5 5 5 5 5 5 5 ...
# $ ex_ideolparty_C: int 7 7 7 7 7 7 7 7 7 7 ...
# $ ex_ideolparty_D: int 4 4 4 4 4 4 4 4 4 4 ...
# $ ex_ideolparty_E: int 4 4 4 4 4 4 4 4 4 4 ...
# $ ex_ideolparty_F: int 5 5 5 5 5 5 5 5 5 5 ...
# $ ex_ideolparty_G: int 5 5 5 5 5 5 5 5 5 5 ...
# $ ex_ideolparty_H: int 4 4 4 4 4 4 4 4 4 4 ...
# $ ex_ideolparty_I: int 5 5 5 5 5 5 5 5 5 5 ...
# $ vote_PR_1 : int 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 ...
然后您的case_when
将正常运行。
(如果甚至有机会某些东西是非积分的,您可能更喜欢as.numeric
。)