Question

我需要在R中创建一个for循环来检查特定客户ID的权重值是否相等。

例如：

Cust#   Weight
1111    100
1111    100
1111    100
1112    50
1112    75
1112    65
1113    80
1113    80
1113    80

在这个例子中，我想返回1111和1113的记录，因为权重在该客户的记录中保持不变。我不想要1112的记录，因为权重在三个记录中波动。

我知道这不应该太难，但我几乎没有使用for循环的经验。任何帮助将不胜感激。

Answer 1

这是基础R的可能性：

df1[df1$Cust %in% df1$Cust[duplicated(df1)],]
#  Cust Weight
#1 1111    100
#2 1111    100
#3 1111    100
#7 1113     80
#8 1113     80
#9 1113     80

data.frame的补充部分可以通过添加否定!运算符来获得：

df1[!df1$Cust %in% df1$Cust[duplicated(df1)],]
#  Cust Weight
#4 1112     50
#5 1112     75
#6 1112     65

在此示例中产生相同结果的更通用的版本可能是

var.rows <- aggregate(Weight ~ Cust, df1, var)
df1[df1$Cust %in% var.rows$Cust[!var.rows$Weight],]

此示例中使用的数据：

df1 <- structure(list(Cust = c(1111L, 1111L, 1111L, 1112L, 1112L, 1112L, 
                1113L, 1113L, 1113L), Weight = c(100L, 100L, 100L, 50L, 75L, 
                65L, 80L, 80L, 80L)), .Names = c("Cust", "Weight"), 
                class = "data.frame", row.names = c(NA, -9L))

Answer 2

我们可以使用uniqueN

library(data.table)
setDT(df1)[, if(uniqueN(Weight)==1) .SD , Cust]
#   Cust Weight
#1: 1111    100
#2: 1111    100
#3: 1111    100
#4: 1113     80
#5: 1113     80
#6: 1113     80

使用base R

的选项

i1 <- rowSums(table(df1)!=0)==1
subset(df1, Cust %in%  names(i1)[i1])

Answer 3

有很多方法可以做到这一点，这是一个data.table解决方案：

library(data.table)

df <- data.table(cust = rep(1111:1113, each=3),
                 weight = c(rep(1000, 3), 50, 75, 65, rep(80,3)))

df[, count := .N, keyby = .(cust, weight)][count==1, .(cust, weight)]

修改和dplyr一个：

library(dplyr)
df %>% group_by(cust) %>% filter(n_distinct(weight)==1) %>% distinct(cust, weight)

Answer 4

您可以使用聚合来获取每个客户的唯一权重计数，并使用它来查找条目：

a <- aggregate(Weight ~ Cust, data=x, FUN=function(y) length(unique(y)))
a$Cust[a$Weight==1]
## [1] "1111" "1113"

用于循环检查值是否相等

4 个答案: