删除R中的重复项,更改余数

时间:2018-05-17 15:35:08

标签: r duplicates

我有一个相当直截了当的问题,但对R来说很新,并且有点挣扎。基本上我需要删除重复的行,然后根据删除的重复项数更改剩余的唯一行。

在原始档案中,我有董事和他们所在的公司董事会,董事出现在每家公司的新行中。我想让每个导演只出现一次,但是列有列出他们的董事会席位数的列(所以1 +被删除的重复数量)以及列出他们所在公司名称的列。

所以我想离开这个:

enter image description here

到此

enter image description here

如果我还能获得代码,将董事“家庭公司”列为她/他是行政人员的公司,而不是局外人。

非常感谢! Ñ

1 个答案:

答案 0 :(得分:0)

您可以使用ddply

中的plyr功能
#First I will enter a part of your original data frame

Name <- c('Abbot, F', 'Abdool-Samad, T', 'Abedian, I', 'Abrahams, F', 'Abrahams, F', 'Abrahams, F')
Position <- c('Executive Director', 'Outsider', 'Outsider', 'Executive Director','Outsider',  'Outsider')
Companies <- c('ARM', 'R', 'FREIT', 'FG', 'CG', 'LG')
NoBoards <- c(1,1,1,1,1,1)

df <- data.frame(Name, Position, Companies, NoBoards)

# Then you could concatenate the Positions and Companies for each Name
library(plyr)

sumPosition <- ddply(df, .(Name), summarize, Position = paste(Position, collapse=", "))
sumCompanies <- ddply(df, .(Name), summarize, Companies = paste(Companies, collapse=", "))

# Merge the results into a one data frame usin the name to join them
df2 <- merge(sumPosition, sumCompanies, by = 'Name')

# Summarize the number of oBoards of each Name
names_NoBoards <-  aggregate(df$NoBoards, by = list(df$Name), sum)
names(names_NoBoards) <- c('Name', 'NoBoards')

# Merge the result whit df2 
df3 <- merge(df2, names_NoBoards, by = 'Name')

你得到这样的东西

                Name                               Position  Companies NoBoards
1        Abbot, F                     Executive Director        ARM        1
2 Abdool-Samad, T                               Outsider          R        1
3      Abedian, I                               Outsider      FREIT        1
4     Abrahams, F Executive Director, Outsider, Outsider FG, CG, LG        3

为了获得一份名单,董事“家庭公司”作为他/她是行政人员而非外人的公司。您可以使用下一个代码

ExecutiveDirector <- df[Position == 'Executive Director', c(1,3)]

df4 <- merge(df3, ExecutiveDirector, by = 'Name', all.x = TRUE)

您将获得下一个数据框

             Name                               Position Companies.x NoBoards Companies.y
1        Abbot, F                     Executive Director         ARM        1         ARM
2 Abdool-Samad, T                               Outsider           R        1        <NA>
3      Abedian, I                               Outsider       FREIT        1        <NA>
4     Abrahams, F Executive Director, Outsider, Outsider  FG, CG, LG        3          FG