Question

我有两个数据集。一个（DFlogin）包含一组用户ID，以及他们尝试登录的两位数“邮政编码”。另一个（Dfrecords）包含用户登录的可接受“邮政编码”列表。

对于用户，他们可以拥有任意数量的尝试登录邮政编码，以及任意数量的可接受的记录邮政编码。

目标是循环遍历DFlogin数据集的行，并将每个用户尝试登录与特定用户的所有可接受登录进行比较。

因此，用户1只能登录邮政编码34，但已从邮政编码21登录。该用户应在新栏目中标记（bad_login）。

userid<-c(1:3)
zipcode1<-c(21,23,4)
zipcode2<-c(NA, 34, 32)

DFlogin<-data.frame(userid,zipcode1,zipcode2)

recordzipcode1<-c(34,23,42)
recordzipcode2<-c(NA, 34, 32)
recordzipcode3<-c(NA, 21,61)

DFrecords<-data.frame(userid, recordzipcode1,recordzipcode2, recordzipcode3)

我猜这个解决方案可以使用几个循环和一个if语句，但我不确定从哪里开始。

Answer 1

您可以apply逐行DFlogin。

在ID表格中找到匹配的DFrecords。仅选择具有邮政编码值（2:4）的列。删除NA值。检查所有值，即登录是否来自可接受的邮政编码。

DFlogin$bad_login <- apply(DFlogin, 1, function(x)  {
      x1 = DFrecords[match(x[1], DFrecords$userid),2:4] 
#2:4 are the columns having zipcodes in DFrecords
      y = x1[!is.na(x1)]
      as.integer(!all(x[2:3] %in% y))
#2:3 are the columns having zipcodes in DFlogin
})

#[1] 1 0 1

循环遍历数据集，将值与另一个数据集

1 个答案: