删除数据框中的重复记录

时间:2016-07-26 15:12:41

标签: r

我在R

中创建了具有以下值的表格
Row 1 as ("Cat","Cat","Cow",NA)
Row 2 as ("Cat","Cow","Cat",NA)
Row 3 as ("Cat","Cat",NA,NA)

但我需要我的最终输出,删除每行中的所有重复值,并删除NA值 输出如下所示

Row 1 as ("Cat","Cow");
Row 2 as ("Cat","Cow"),
Row 3 as ("Cat"," " )

1 个答案:

答案 0 :(得分:3)

我们可以使用apply循环遍历行(MARGIN = 1),删除重复项(!duplicated(x))和NA(!is.na(x)),输出可以是list如果删除后每行中的元素数量不同length。要将其转换回matrix,我们可以使用stri_list2matrix(来自stringi)在结尾填充空白值。

lst <- apply(df1, 1, FUN = function(x) x[!is.na(x) & !duplicated(x)])
library(stringi)
stri_list2matrix(lst, fill='', byrow=TRUE)
#     [,1]  [,2] 
#[1,] "Cat" "Cow"
#[2,] "Cat" "Cow"
#[3,] "Cat" ""