假设我有这样的行:
First, Last, Address, Address 2, Email, Custom1, Custom2, Custom3
1 A, B, C, D, E@E.com,1,2,3
2 A, , C, D, E@E.com,1,2,
3 , , , , E@E.com,1, ,
我想要做的是创建一个函数来拉出最完整的行,我想知道是否有任何包或预先存在的方法(建议,甚至)这样做。在上面的例子中,我想有一个选择第1行的函数。
我不能使用complete.cases()或na.omit(),因为在许多情况下,案例不完整且至少包含一个NA。我尝试将unique()与一些特定的拉动相结合......但是我没有太多运气自动化这个操作任务。
答案 0 :(得分:1)
你可以转换为字符,并逐行计算你有多少非空洞:
R> Lines <- "
+ First, Last, Address, Address 2, Email, Custom1, Custom2, Custom3
+ A, B, C, D, E@E.com,1,2,3
+ A, , C, D, E@E.com,1,2,
+ , , , , E@E.com,1, ,
+ "
R>
R> con <- textConnection(Lines)
R> df <- read.table(con, header=TRUE, sep=",")
R> close(con)
R>
R> m <- as.matrix(df) # now all char
R>
R> counts <- apply(m, 1, function(r) { r <- gsub("^ $", "", r);
+ sum(na.omit(r) != "") } )
R>
R> df[which.max(counts), ] # pick row of maximum
First Last Address Address.2 Email Custom1 Custom2 Custom3
1 A B C D E@E.com 1 2 3
R>
答案 1 :(得分:1)
您可以使用“”小于任何字母或任何数字的事实,因此只需在应用框架中使用sum (x >"" , na.rm=TRUE)
:
> apply(tst, 1, function(x) sum(x > "", na.rm=TRUE))
[1] 8 7 6
> idx <- apply(tst, 1, function(x) sum(x > "", na.rm=TRUE))
> tst[which.max(idx),]
First Last Address Address.2 Email Custom1 Custom2 Custom3
1 1 A B C D E@E.com 1 2 3
答案 2 :(得分:1)
虽然我已经发布了一些有效的解决方案。它类似(使用apply
和sum
)但使用正则表达式(通过grepl
)来实现它。所以你可以尝试使用你想要的任何模式。使用的“技巧”是逻辑值可以求和:
x <- structure(list(First = c("A", "A", ""), Last = c("B", " ", " "
), Address = c("C", "C", " "), Address.2 = c("D", "D", " "),
Email = c(" E@E.com", " E@E.com", " E@E.com"), Custom1 = c(1L,
1L, 1L), Custom2 = c(2L, 2L, NA), Custom3 = c(3L, NA, NA)), .Names = c("First",
"Last", "Address", "Address.2", "Email", "Custom1", "Custom2",
"Custom3"), class = "data.frame", row.names = c(NA, -3L))
mostComplete <- function(x) {
tmp <- apply(x,1,grepl, pattern = "[[:alnum:]]")
return(which.max(apply(tmp,2,sum)))
}
mostComplete(x)
[1] 1
PS:给年轻人一个机会......