我有一个数据集,告诉我客户电子邮件,客户编号以及他们是否是管理员。我们需要在同一记录上提供客户管理员的电子邮件,我们可以从数据中得出,只要记录的客户编号等于管理员记录的编号,将管理员的电子邮件放在该行中。此外,第二个管理员'应该有自己的电子邮件显示在'管理员电子邮件中。字段而不是“第一个管理员”字段。为那个客户。
我通过将管理员子集化为新的数据帧,然后将管理数据集和客户编号上的原始数据集合并来完成此操作。问题是客户有2个管理员,因为连接导致重复记录。有没有办法解决这个问题,如果为1位客户列出2个管理员,请使用第1个管理员电子邮件'?
##sample Data
df <- data.frame(Email = c("test1@gmail.com", "test2@gmail.com", "test3@gmail.com","test4@gmail.com","test5@gmail.com","test6@gmail.com", "test7@gmail.com"),
Admin = c("Y", "N", "N","Y","N", "Y", "N"),
CustNum = c("1111","1111","1111","2222","2222","2222", "2222"))
##My solution
admins <- subset(df, df$Admin == "Y")
output <- merge(df, admins, by = "CustNum", all.x = TRUE)
colnames(output)[colnames(output)=="Email.y"] <- "Admin_Email"
expected <- data.frame(Email = c("test1@gmail.com", "test2@gmail.com", "test3@gmail.com","test4@gmail.com","test5@gmail.com","test6@gmail.com", "test7@gmail.com"),
Admin = c("Y", "N", "N","Y","N", "Y", "N"),
CustNum = c("1111","1111","1111","2222","2222","2222", "2222"),
Adminemail = c("test1@gmail.com","test1@gmail.com","test1@gmail.com","test4@gmail.com","test4@gmail.com","test6@gmail.com", "test4@gmail.com"))
答案 0 :(得分:1)
我认为最简单的方法是使用for循环。但是有一种方法可以使用data.table,但我可以搞清楚......
工作解决方案 但不是最佳
df$Adminemail = NA
for(i in 1:nrow(df)){
### If the admin is himself then :
if(df$Admin[i] == "Y"){
df$Adminemail[i] = as.character(df$Email[i])
}
### Otherwise it fill up with the first admin-email found
else{
sub <- df[df$CustNum == df$CustNum[i],]
df$Adminemail[i] <- as.character(sub[sub$Admin=="Y",]$Email[1])
}
}
如果您的数据集很大,for循环可能会给您带来一些问题。但是,如果您可以创建唯一ID。我非常肯定data.table是一些更好,更优化的解决方案。
不工作的解决方案 ,但可能更好的途径
df$Unique <- paste(df$Email,df$CustNum,sep="_")
library(data.table)
setDT(df)
setDT(admins)
# inner join - use `nomatch` argument
admins[df, nomatch=0L, on = "Unique"]
我在此post
找到了这段代码答案 1 :(得分:1)
我没有使用循环找不到解决方案,但它有效,试试这个。
## sample Data
df <- data.frame(Email = c("test1@gmail.com", "test2@gmail.com", "test3@gmail.com","test4@gmail.com","test5@gmail.com","test6@gmail.com", "test7@gmail.com"),
Admin = c("Y", "N", "N","Y","N", "Y", "N"),
CustNum = c("1111","1111","1111","2222","2222","2222", "2222"))
## My solution
library(dplyr)
admins <- df %>% filter(Admin == 'Y') %>%
select(Email, Admin, CustNum) %>%
mutate(AdminEmail = Email)
# find the first match for each unique CustNum
ind = sapply(unique(admins$CustNum), function(x) which(admins$CustNum == x)[1])
first_match = admins[ind, ]
# merge data
output = full_join(df, admins, by = c('Email', 'CustNum', 'Admin'))
# fill in NAs
for (i in 1:nrow(output)) {
if (is.na(output$AdminEmail[i])) {
output$AdminEmail[i] = first_match$AdminEmail[which(first_match$CustNum == output$CustNum[i])]
}
}