如何删除R中的重复列名?

时间:2014-06-10 13:57:36

标签: r

我有非常大的矩阵,我知道它们的一些副词是重复的。所以我只想找到那些重复的colnames并从列中删除重复。 我尝试了duplicate(),但删除了重复的条目。 有人会帮我在R中暗示这个吗? 重点是,重复的同名,可能没有重复的内容。

5 个答案:

答案 0 :(得分:37)

假设temp是你的矩阵

temp <- matrix(seq_len(15), 5, 3)
colnames(temp) <- c("A", "A", "B")

##      A  A  B
## [1,] 1  6 11
## [2,] 2  7 12
## [3,] 3  8 13
## [4,] 4  9 14
## [5,] 5 10 15

你可以做到

temp <- temp[, !duplicated(colnames(temp))]

##      A  B
## [1,] 1 11
## [2,] 2 12
## [3,] 3 13
## [4,] 4 14
## [5,] 5 15

或者,如果您想保留最后一个重复列,则可以执行

temp <- temp[, !duplicated(colnames(temp), fromLast = TRUE)] 

##       A  B
## [1,]  6 11
## [2,]  7 12
## [3,]  8 13
## [4,]  9 14
## [5,] 10 15

答案 1 :(得分:12)

或者假设data.frames可以使用drop table if exists comment; create table comment(id int,`comment` varchar(10), company varchar(3), state varchar(3), private int); drop table if exists review; create table review (id int,avg int,company varchar(3),state varchar(3)); insert into comment values (1,'Test','abc','il',0),(1,'Ver','def','il',0); insert into review values(1,0,'abc','il'),(2,4,'abc','il'),(3,0,'def','il'); MariaDB [sandbox]> select c.*, r.avg -> from comment c -> join review r on c.company = r.company and c.state = r.state -> where c.`comment` = 'Test' and c.private = 0 and -> r.id = (select max(r1.id) from review r1 where r1.company = r.company and r1.state = r.state) -> ; +------+---------+---------+-------+---------+------+ | id | comment | company | state | private | avg | +------+---------+---------+-------+---------+------+ | 1 | Test | abc | il | 0 | 4 | +------+---------+---------+-------+---------+------+ 1 row in set (0.00 sec)

subset

请注意,此处subset(iris, select=which(!duplicated(names(.)))) 不适用,因为它已在输入数据中要求列唯一性。

答案 2 :(得分:1)

将所有重复项存储到一个向量中,表示重复项,并使用-duplicates和单括号子集删除重复列。

       # Define vector of duplicate cols (don't change)
       duplicates <- c(4, 6, 11, 13, 15, 17, 18, 20, 22, 
            24, 25, 28, 32, 34, 36, 38, 40, 
            44, 46, 48, 51, 54, 65, 158)

      # Remove duplicates from food and assign it to food2
         food2 <- food[,-duplicates]

答案 3 :(得分:0)

要按名称删除特定的重复列,可以执行以下操作:

test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)) & names(test) == "Species")
test = test[,-idx]

要删除所有重复的列,这比较简单:

test = cbind(iris, iris) # example with multiple duplicate columns
idx = which(duplicated(names(test)))
test = test[,-idx]

或:

test = cbind(iris, iris) # example with multiple duplicate columns
test = test[,!duplicated(names(test))]

答案 4 :(得分:-1)

temp = matrix(seq_len(15), 5, 3)
colnames(temp) = c("A", "A", "B")

temp = as.data.frame.matrix(temp)
temp = temp[!duplicated(colnames(temp))]
temp = as.matrix(temp)