Question

我正在尝试子列出包含某些元素组合的行，其中组合来自另一个数据帧。

第一个数据框显示了一群农民拥有的所有动物及其重量，第二个数据框表示实际上农民已经出售了所有特定类型的动物，所以他们都应该从集合中移除。在我的例子中，詹姆斯卖掉了他所有的鹿和爱丽丝卖掉了她所有的Giga鸡，但舒伯特并没有卖掉他的鹿，所以没有什么需要对他做的。如果只有一个变量我可以使用％in％但我无法使用两个变量。我解决它的方式是使用凌乱的嵌套if和for循环，但我想有一种更有效的方法。

owner <-c("Fred", "Mary", "James", "Ingrid", "Schubert", "Alice") #owner names
animal <-c("Cow", "Giant sheep", "Deer", "Giga chicken") #Animal types
data <- data.frame(owner= sample(owner, 1000, replace= TRUE), animal=sample(animal, 1000, replace= TRUE), weight=rnorm(1000,mean=250, sd=50)) #data set

sub.set <- data.frame(cbind(c("James","Alice", "Schubert"),c("Deer","Giga chicken", "Deer"), c(0,0,1)))

for (i in unique(sub.set[,1])) {
    for (y in unique(sub.set[,2])) {

        #first if statement prevents error that occur if the subset data doesn't have one of the loop combinations
        if(length(sub.set[sub.set$X1 ==i & sub.set$X2 ==y,3])>0){
            if (sub.set[sub.set$X1 ==i & sub.set$X2 ==y,3]==0)
            { data <- data[!(data$owner==i & data$animal==y),]}
        }
    }
}
xtabs(weight ~., data)

从交叉表中可以看出，正确的元素已经被子集化，但是以一种可怕的方式，非常感谢帮助以更简单的方式执行此操作！

Answer 1

这会产生与您的代码相同的结果，并且不会使用循环。

set.seed(1)   # need reproducible sample data
owner <-c("Fred", "Mary", "James", "Ingrid", "Schubert", "Alice") #owner names
animal <-c("Cow", "Giant sheep", "Deer", "Giga chicken") #Animal types
data <- data.frame(owner= sample(owner, 1000, replace= TRUE), 
                   animal=sample(animal, 1000, replace= TRUE), 
                   weight=rnorm(1000,mean=250, sd=50)) #data set
# note the column names in sub.set
sub.set <- data.frame(owner=c("James","Alice", "Schubert"),
                      animal=c("Deer","Giga chicken", "Deer"), 
                      count=c(0,0,1))
# this is the code to exclude rows where there are no animals left
data <- merge(data,sub.set,by=c("owner","animal"),all.x=T)
data <- with(data,data[count!=0 | is.na(count),])
data <- data[,-4]
xtab.2 <- xtabs(weight~.,data)

此代码在data列和sub.set列上合并owner和animal以创建新列count。然后，它仅包含count!=0或!is.na(count)的行。然后它删除count列并像以前一样计算交叉表。

如何在两个条件上进行子集化，这两个条件本身包含多个值

1 个答案: