Question

我有这种ID格式的数据及其显示的次数。我想编写一个函数来返回两次以上的ID。

这是我的代码

if (data$Freq>=2){
  return(data$ID)
} else {
  print("no duplicates of years")
}

我收到以下anser和警告

[1] "no duplicates of years"
Warning message:
In if (x$Freq > 1) { :
  the condition has length > 1 and only the first element will be used

我可能做错了什么？

修改

感谢大家的回复。我想我创建频率表的方式有问题

x=as.data.frame(table(data$cid))

其中cid是ID。当我试图看到第一栏中的元素，即

> x$var1[1:20,]

我得到NULL，而这个

>x$Freq[1:20,]

将返回

Error in x$Freq[1:20, ] : incorrect number of dimensions'

但x[1:20,]会返回显示x中元素的数据框。

Answer 1

有一个命令duplicated()可以在不参考$ Freq列的情况下执行此操作：

data$ID[duplicated(data$ID)]

你自己的代码不能正常，因为if()需要一个TRUE / FALSE条件，它只会查看数据$ Freq的第一个元素，然后停止。

如果你想做这样的事情，那么which()或者a就是你想要的：

df= data.frame(freq=rep(1:2,5), id=1:10)

 df

   freq id
1     1  1
2     2  2
3     1  3
4     2  4
5     1  5
6     2  6
7     1  7
8     2  8
9     1  9
10    2 10

df$id[which(df$freq>1)]
[1]  2  4  6  8 10

甚至

df$id[df$freq>1]
[1]  2  4  6  8 10

Answer 2

正如@Andrie建议的那样，ifelse可能有用：

根据您的其他信息，这是一个可重现的示例：

set.seed(1)

data <- as.data.frame(table(data.frame(cid = sample(100:120, 30, replace=TRUE))))

> ifelse(data$Freq-1, as.character(data$Var1), "no duplicates of years")
#  [1] "no duplicates of years" "no duplicates of years" "no duplicates of years"
#  [4] "no duplicates of years" "104"                    "105"                   
#  [7] "107"                    "108"                    "no duplicates of years"
# [10] "no duplicates of years" "113"                    "no duplicates of years"
# [13] "no duplicates of years" "116"                    "118"                   
# [16] "119"                    "no duplicates of years"

只显示带有Freq的ID＆gt; 1：

data$Var1[as.logical(data$Freq - 1)]
# [1] 104 105 107 108 113 116 118 119
# 17 Levels: 100 101 102 103 104 105 107 108 110 112 113 114 115 116 118 ... 120

找到重复的

2 个答案: