Question

我需要删除我的数据帧的特定行，但我遇到了麻烦。数据集如下所示：

> head(mergedmalefemale)
  coupleid gender shop time amount
1        1      W    3    1  29.05
2        1      W    1    2  31.65
3        1      W    3    3     NA
4        1      W    2    4  17.75
5        1      W    3    5 -28.40
6        2      W    1    1  42.30

我想要做的是删除至少有一个数量是NA或者负数的夫妇的所有记录。在上面的示例中，应删除所有具有coupleid“1”的行，因为存在具有负值和NA的行。我尝试使用na.omit(mergedmalefemale)等函数，但这只删除了NA的行，但没有删除具有相同cupleid的其他行。因为我是初学者，如果有人可以帮助我，我会很高兴。

Answer 1

由于您不希望仅省略NA或负数，但想要省略具有相同ID的所有数据，您必须先找到要删除的ID，然后将其删除。

mergedmalefemale <- read.table(text="
    coupleid gender shop time amount
    1        1      W    3    1  29.05
    2        1      W    1    2  31.65
    3        1      W    3    3     NA
    4        1      W    2    4  17.75
    5        1      W    3    5 -28.40
    6        2      W    1    1  42.30", 
    header=TRUE)

# Find NA and negative amounts
del <- is.na(mergedmalefemale[,"amount"]) | mergedmalefemale[,"amount"]<0
# Find coupleid with NA or negative amounts
ids <- unique(mergedmalefemale[del,"coupleid"])
# Remove data with coupleid such that amount is NA or negative
mergedmalefemale[!mergedmalefemale[,"coupleid"] %in% ids,]

Answer 2

这是另一种选择。考虑一下您的data.frame被称为df

> na.omit(df[ rowSums(df[, sapply(df, is.numeric)]< 0, na.rm=TRUE)  ==0, ])
  coupleid gender shop time amount
1        1      W    3    1  29.05
2        1      W    1    2  31.65
4        1      W    2    4  17.75
6        2      W    1    1  42.30

Answer 3

另一个应用data.table

的好机会

require(data.table)
mergedmalefemale <- as.data.table(mergedmalefemale)
mergedmalefemale[, if(!any(is.na(amount) | amount < 0)) .SD, by=coupleid]

#   coupleid gender shop time amount
#1:        2      W    1    1   42.3

Answer 4

这是一种相当肮脏的方式

# identify the coupleids that need to stay/be removed
agg <- aggregate(amount ~ coupleid, data=mergedmalefemale, FUN=function(x) min(is.na(x)|(x>0)))

# insert a column alongside "amount.y" that puts a 0 next to rows to be deleted
df.1 <- merge(mergedmalefemale, agg, by="coupleid")

# delete the rows
df.1 <- df.1[df.1$amount.y == 1, ]

如何删除数据框中的记录

4 个答案: