RHS中使用dplyr:case_when()的评估错误

时间:2019-07-06 16:45:11

标签: r dplyr

我正在使用CSES(选举系统比较研究)的数据来评估选民与政党之间的意识形态距离。

我使用了此处提供的case_when命令:Changing row names in a data_frame from letters to numbers in R

它对于某些变量非常有效,但是现在我尝试将相同的代码与相似的变量(所有变量都使用数字)结合使用,并产生以下错误: mutate_impl(.data,点)中的错误:

  

评估错误:情况6(ex_ideolparty_F)的RHS必须为double类型,而不是整数。

我正在使用的数据在此处提供:http://www.cses.org/datacenter/imd/data/cses_imd_r.zip

在使用case_when之前,我仅对其进行了一些转换。这是我在错误发生之前运行的确切代码:

library(dplyr)
library(descr)

load("/cses_imd.rdata")

##### DATA CLEANING/RENAMING #####

cses <- cses_imd  %>% 
  rename (election = IMD1004, country = IMD1006_NAM, type = IMD1009, age = IMD2001_1, gender = IMD2002,
          education = IMD2003, income = IMD2006, party =IMD3005_3, party_int = IMD3005_4, ideol_self = IMD3006,
          turnout = IMD5006_1, turnout_VAP = IMD5006_2, compulsory = IMD5007) %>%        
  select(election, country, type, age, gender, education, income, starts_with("IMD3002"), starts_with ("IMD3004"),
         party, party_int, ideol_self, starts_with("IMD3007"), turnout, turnout_VAP, compulsory,
         starts_with("IMD500"), starts_with("IMD501"))

### MORE RENAMING:

names (cses) <- gsub("IMD3002", "vote", names(cses)) 
names (cses) <- gsub("IMD3004", "prevote", names(cses)) 
names (cses) <- gsub("IMD3007", "ideolparty", names(cses)) 
names (cses) <- gsub("IMD5000", "numparty", names(cses)) 
names (cses) <- gsub("IMD5012", "ex_ideolparty", names(cses)) 
names (cses) <- gsub("IMD5013", "formula_house", names(cses)) 
names (cses) <- gsub("IMD5014", "formula_pres", names(cses)) 

cses$year <- as.numeric(substr(cses$election, 5, 8))


###### PERCEIVED IDEOLOGY OF THE PARTY VOTED #####

cses <- cses %>% mutate (
  ideol_voted_PR1 = case_when(
    numparty_A == vote_PR_1 ~ ideolparty_A,
    numparty_B == vote_PR_1 ~ ideolparty_B,
    numparty_C == vote_PR_1 ~ ideolparty_C,
    numparty_D == vote_PR_1 ~ ideolparty_D,
    numparty_E == vote_PR_1 ~ ideolparty_E,
    numparty_F == vote_PR_1 ~ ideolparty_F,
    numparty_G == vote_PR_1 ~ ideolparty_G,
    numparty_H == vote_PR_1 ~ ideolparty_H,
    numparty_I == vote_PR_1 ~ ideolparty_I,
    TRUE                    ~ vote_PR_1
  )
)

这是发生问题的地方:

##### PERCEIVED IDEOLOGY OF PARTY VOTED (EXPERT PLACEMENT):

cses <- cses %>% mutate (
  ideol_ex_PR1 = case_when(
    numparty_A == vote_PR_1 ~ ex_ideolparty_A,
    numparty_B == vote_PR_1 ~ ex_ideolparty_B,
    numparty_C == vote_PR_1 ~ ex_ideolparty_C,
    numparty_D == vote_PR_1 ~ ex_ideolparty_D,
    numparty_E == vote_PR_1 ~ ex_ideolparty_E,
    numparty_F == vote_PR_1 ~ ex_ideolparty_F,
    numparty_G == vote_PR_1 ~ ex_ideolparty_G,
    numparty_H == vote_PR_1 ~ ex_ideolparty_H,
    numparty_I == vote_PR_1 ~ ex_ideolparty_I,
    TRUE                    ~ vote_PR_1
  )
)

为什么会这样?我已经检查了这里使用的所有列,情况6“ ex_ideolparty_F”与其他情况没有什么不同,即使是第一次使用case_when的情况也没有什么不同,这种情况效果很好。所有这些列都是数字,而不是双精度。

1 个答案:

答案 0 :(得分:1)

类似于if_else,所有返回的值都必须是同一类型,因此numericinteger是不同的。

如果查看数据,就会发现差异:

str(cses[,c("ex_ideolparty_A", "ex_ideolparty_B", "ex_ideolparty_C", "ex_ideolparty_D", "ex_ideolparty_E", "ex_ideolparty_F", "ex_ideolparty_G", "ex_ideolparty_H", "ex_ideolparty_I", "vote_PR_1")])
# 'data.frame': 281083 obs. of  10 variables:
#  $ ex_ideolparty_A: num  6 6 6 6 6 6 6 6 6 6 ...
#  $ ex_ideolparty_B: num  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_C: num  7 7 7 7 7 7 7 7 7 7 ...
#  $ ex_ideolparty_D: num  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_E: num  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_F: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_G: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_H: int  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_I: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ vote_PR_1      : int  9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 ...

根据您的数据,如果所有数据都是整数,则可以使用以下方法进行修复:

cses <- cses %>%
    mutate_at(vars(ex_ideolparty_A, ex_ideolparty_B, ex_ideolparty_C, ex_ideolparty_D, ex_ideolparty_E, ex_ideolparty_F, ex_ideolparty_G, ex_ideolparty_H, ex_ideolparty_I, vote_PR_1),
              as.integer)
str(cses[,c("ex_ideolparty_A", "ex_ideolparty_B", "ex_ideolparty_C", "ex_ideolparty_D", "ex_ideolparty_E", "ex_ideolparty_F", "ex_ideolparty_G", "ex_ideolparty_H", "ex_ideolparty_I", "vote_PR_1")])
# 'data.frame': 281083 obs. of  10 variables:
#  $ ex_ideolparty_A: int  6 6 6 6 6 6 6 6 6 6 ...
#  $ ex_ideolparty_B: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_C: int  7 7 7 7 7 7 7 7 7 7 ...
#  $ ex_ideolparty_D: int  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_E: int  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_F: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_G: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ ex_ideolparty_H: int  4 4 4 4 4 4 4 4 4 4 ...
#  $ ex_ideolparty_I: int  5 5 5 5 5 5 5 5 5 5 ...
#  $ vote_PR_1      : int  9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 9999996 ...

然后您的case_when将正常运行。

(如果甚至有机会某些东西是非积分的,您可能更喜欢as.numeric。)