现在我假设有下表:
.1 .2 .3
.2 1 C
.2 1 C
.2 1 C
.3 1 N
.3 1 N
.4 1 N
.4 1 N
.4 1 N
.4 1 N
鉴于列2相同且列1的值不同,我们只想保留列3包含C的行。这应该产生下表:
.1 .2 .3
.2 1 C
.2 1 C
.2 1 C
我已经看过以下问题:
Remove duplicates based on 2nd column condition
R, conditionally remove duplicate rows
Conditionally removing duplicates in R
您知道如何实现这一目标吗?
答案 0 :(得分:1)
我不确定我是否完全理解您需要做什么,但这是尝试使用简单的If
语句来检查两列的差异,即
if (var(dd3$X1) != 0 & var(dd3$X2) == 0) { dd3 <- subset(dd3, X3 == 'C')}
dd3
# X1 X2 X3
#1 2 1 C
#2 2 1 C
#3 2 1 C
哪里
dput(dd3)
structure(list(X1 = c(2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L), X2 = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X3 = structure(c(1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", "N"), class = "factor")), class = "data.frame", row.names = c(NA, -9L))
答案 1 :(得分:0)
“如果第2列相同,”我并不完全理解您的意思。您可以使用subset
subset(df, df$col3 == "C" & df$col1 != df$col2)
我使用col1,col2和col3作为标题
答案 2 :(得分:0)
也许您可以使用ave
尝试以下基本R代码,即
dfout <- subset(df,as.logical(ave(X3,X1,X2,FUN = function(v) v=="C")))
# > dfout
# X1 X2 X3
# 1 2 1 C
# 2 2 1 C
# 3 2 1 C
数据
df <- structure(list(X1 = c(2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L), X2 = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X3 = c("C", "C", "C", "N", "N",
"N", "N", "N", "N")), row.names = c(NA, -9L), class = "data.frame")