以下是我的问题的一个例子,以表明我的观点。
Random <- sample(c("A","B","C","D","E","F","G"), size = 100, replace =
TRUE)
Year <- sample(c(2000,2001,2002,2003,2004,2005), 100, TRUE)
Value <- sample(c(1,2,3,4), 100, TRUE)
data <- data.frame(Random,Year,Value)
所以我想要做的是删除#Table1中一年中不会更改其值的所有行,或者至少只返回#Table2中的Random列中的行。我在这个例子中标记了我想删除的行,以便更好地理解我的问题。
答案 0 :(得分:3)
根据您的逻辑,如果值永远不会改变,则应该将一行作为删除目标。当最小值和最大值相同时,这种情况可以表示为真。试试这个:
df <- data.frame(Random=c("A", "B", "C", "D", "E", "F", "G"),
`2000`=c(1,1,0,2,2,0,3),
`2001`=c(0,1,0,2,3,0,3),
`2002`=c(2,1,0,2,0,1,3),
`2003`=c(1,1,0,2,0,0,3),
`2004`=c(4,1,0,2,1,0,3),
`2005`=c(5,1,0,2,1,0,3), stringsAsFactors=FALSE)
df.target <- df[, !(names(df) %in% c("Random"))]
df[apply(df.target, 1, function(x) min(x)!=max(x)), ]
Random X2000 X2001 X2002 X2003 X2004 X2005
1 A 1 0 2 1 4 5
5 E 2 3 0 0 1 1
6 F 0 0 1 0 0 0
修改强>
如果您还想删除表1中的行,这些行的名称与第二个表中要删除的行相匹配,您可以尝试:
names.rm <- df$Random[apply(df.target, 1, function(x) min(x)==max(x))]
table1[!table1$Random %in% names.rm, ]
答案 1 :(得分:2)
在@TimBiegeleisen的答案中使用数据集,以下内容也只保留至少有一个不同值的行。
df[apply(df[-1], 1, function(x) any(x[-1] != x[1])), ]
答案 2 :(得分:2)
以下是使用rowMins/rowMaxs
library(matrixStats)
df[rowMins(as.matrix(df[-1])) != rowMaxs(as.matrix(df[-1])),]
或pmin/pmax
df[do.call(pmin, df[-1]) != do.call(pmax, df[-1]),]
# Random X2000 X2001 X2002 X2003 X2004 X2005
#1 A 1 0 2 1 4 5
#5 E 2 3 0 0 1 1
#6 F 0 0 1 0 0 0
答案 3 :(得分:2)
另一种方法是以长格式处理数据,然后重新整形。这是一种更好的方法,因为在data.frames上执行逐行操作的成本很高。这是使用Tim数据集的修改版本的基本R解决方案。
reshape(dat[ave(dat$count, dat$Random, FUN=var) != 0, ],
direction="wide", idvar="Random", timevar="year")
此处,ave(dat$count, dat$Random, FUN=var)
选择具有非零方差的dat$Random
值,返回TRUE。生成的向量用于对data.frame进行子集化,然后使用基本R的reshape
函数将其重新整形为所需的格式。
返回
Random count.2000 count.2001 count.2002 count.2003 count.2004 count.2005
1 A 1 0 2 1 4 5
5 E 2 3 0 0 1 1
6 F 0 0 1 0 0 0
数据强>
dat <-
structure(list(Random = c("A", "B", "C", "D", "E", "F", "G",
"A", "B", "C", "D", "E", "F", "G", "A", "B", "C", "D", "E", "F",
"G", "A", "B", "C", "D", "E", "F", "G", "A", "B", "C", "D", "E",
"F", "G", "A", "B", "C", "D", "E", "F", "G"), year = c(2000,
2000, 2000, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2001,
2001, 2001, 2002, 2002, 2002, 2002, 2002, 2002, 2002, 2003, 2003,
2003, 2003, 2003, 2003, 2003, 2004, 2004, 2004, 2004, 2004, 2004,
2004, 2005, 2005, 2005, 2005, 2005, 2005, 2005), count = c(1,
1, 0, 2, 2, 0, 3, 0, 1, 0, 2, 3, 0, 3, 2, 1, 0, 2, 0, 1, 3, 1,
1, 0, 2, 0, 0, 3, 4, 1, 0, 2, 1, 0, 3, 5, 1, 0, 2, 1, 0, 3)),
.Names = c("Random", "year", "count"), row.names = c(NA, -42L),
class = "data.frame")