比较单元格并将NA视为正匹配

时间:2015-07-01 08:47:56

标签: r

我有这样的数据:

daata$comp_subacon[mapply(setequal,strsplit(daata$P1_location_subacon, ","), strsplit(daata$P2_location_subacon, ","))] <- 1

我使用下面的函数比较单元格(位置):

NA

这个功能有什么作用?

它比较单元格内的文本是否相同,如果它是真的,则将数字1放在新列中。问题是,对于一些水果/蔬菜的位置是未知的,在这种情况下,我想把它作为一个积极的匹配,所以把数字1放在下一列。未知的本地化标记为> dput(daata_after_fun) structure(list(P1 = structure(c(1L, 1L, 3L, 3L, 5L, 5L, 5L, 5L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 2L, 2L), .Label = c("Apple", "Grape", "Orange", "Peach", "Tomato"), class = "factor"), P2 = structure(c(4L, 4L, 3L, 3L, 5L, 5L, 5L, 5L, 6L, 6L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 6L, 6L), .Label = c("Banana", "Cucumber", "Lemon", "Orange", "Potato", "Tomato"), class = "factor"), P1_location_subacon = structure(c(NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Fridge", "Table"), class = "factor"), P1_location_all_predictors = structure(c(2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Table,Desk,Bag,Fridge,Bed,Shelf,Chair", "Table,Shelf,Cupboard,Bed,Fridge", "Table,Shelf,Fridge"), class = "factor"), P2_location_subacon = structure(c(1L, 1L, 1L, 1L, NA, NA, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Fridge", "Shelf"), class = "factor"), P2_location_all_predictors = structure(c(3L, 3L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Shelf,Fridge", "Shelf,Fridge,Bed", "Table,Shelf,Fridge"), class = "factor"), comp_subacon = c(0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), .Names = c("P1", "P2", "P1_location_subacon", "P1_location_all_predictors", "P2_location_subacon", "P2_location_all_predictors", "comp_subacon"), row.names = c(NA, -20L), class = "data.frame") 。你知道如何修改我目前使用的功能吗?我也可以使用不同的......

编辑:首先尝试代码:

 CREATE TABLE users (
  USER_ID INT(10) UNSIGNED NOT NULL,
  USERNAME VARCHAR(45) NOT NULL,
  PASSWORD VARCHAR(45) NOT NULL,
  ENABLED tinyint(1) NOT NULL,
  PRIMARY KEY (USER_ID)
);
CREATE TABLE user_roles (
  USER_ROLE_ID INT(10) UNSIGNED NOT NULL,
  USER_ID INT(10) UNSIGNED NOT NULL,
  AUTHORITY VARCHAR(45) NOT NULL,
  PRIMARY KEY (USER_ROLE_ID),
  KEY FK_user_roles (USER_ID),
  CONSTRAINT FK_user_roles FOREIGN KEY (USER_ID) 
  REFERENCES users (USER_ID)
);

1 个答案:

答案 0 :(得分:3)

你可以定义一个函数

eq_or_na <- function( a , b ) (!is.na(a) & !is.na(b) & a==b) | (is.na(a) | is.na(b))

然后以下内容应该有效:

daata$comp_subacon[eq_or_na(as.character(daata$P1_location_subacon), 
                            as.character(daata$P2_location_subacon))] <- 1

如果您的变量P1_location_all_predictors中有类似的设置,则可以执行以下操作:

seteq_or_na <- function( a , b ) (!any(is.na(a)) & !any(is.na(b)) & setequal(a, b)) | (all(is.na(a)) | all(is.na(b)))
daata$comp_subacon[mapply(seteq_or_na, 
                          strsplit(as.character(daata$P1_location_subacon), ","), 
                          strsplit(as.character(daata$P2_location_subacon), ","))] <- 1

例如,对于P1_location_all_predictorsP2_location_all_predictors,您可以直接定义新变量:

daata$comp_subacon_2 <- +(mapply(seteq_or_na, 
                                 strsplit(as.character(daata$P1_location_all_predictors), ","), 
                                 strsplit(as.character(daata$P2_location_all_predictors), ",")))

修改

如果您想知道两组之间是否至少有一个公共位置,您可以定义一个新功能:

inter_or_na <- function( a , b ) (!any(is.na(a)) & !any(is.na(b)) & length(intersect(a, b))) | (all(is.na(a)) | all(is.na(b)))

然后将其应用于您的2列:

daata$comp_subacon_3 <- +(mapply(inter_or_na, 
                                 strsplit(as.character(daata$P1_location_all_predictors), ","), 
                                 strsplit(as.character(daata$P2_location_all_predictors), ",")))