我有一个相当直截了当的问题,但对R来说很新,并且有点挣扎。基本上我需要删除重复的行,然后根据删除的重复项数更改剩余的唯一行。
在原始档案中,我有董事和他们所在的公司董事会,董事出现在每家公司的新行中。我想让每个导演只出现一次,但是列有列出他们的董事会席位数的列(所以1 +被删除的重复数量)以及列出他们所在公司名称的列。
所以我想离开这个:
到此
如果我还能获得代码,将董事“家庭公司”列为她/他是行政人员的公司,而不是局外人。
非常感谢! Ñ
答案 0 :(得分:0)
您可以使用ddply
包
plyr
功能
#First I will enter a part of your original data frame
Name <- c('Abbot, F', 'Abdool-Samad, T', 'Abedian, I', 'Abrahams, F', 'Abrahams, F', 'Abrahams, F')
Position <- c('Executive Director', 'Outsider', 'Outsider', 'Executive Director','Outsider', 'Outsider')
Companies <- c('ARM', 'R', 'FREIT', 'FG', 'CG', 'LG')
NoBoards <- c(1,1,1,1,1,1)
df <- data.frame(Name, Position, Companies, NoBoards)
# Then you could concatenate the Positions and Companies for each Name
library(plyr)
sumPosition <- ddply(df, .(Name), summarize, Position = paste(Position, collapse=", "))
sumCompanies <- ddply(df, .(Name), summarize, Companies = paste(Companies, collapse=", "))
# Merge the results into a one data frame usin the name to join them
df2 <- merge(sumPosition, sumCompanies, by = 'Name')
# Summarize the number of oBoards of each Name
names_NoBoards <- aggregate(df$NoBoards, by = list(df$Name), sum)
names(names_NoBoards) <- c('Name', 'NoBoards')
# Merge the result whit df2
df3 <- merge(df2, names_NoBoards, by = 'Name')
你得到这样的东西
Name Position Companies NoBoards
1 Abbot, F Executive Director ARM 1
2 Abdool-Samad, T Outsider R 1
3 Abedian, I Outsider FREIT 1
4 Abrahams, F Executive Director, Outsider, Outsider FG, CG, LG 3
为了获得一份名单,董事“家庭公司”作为他/她是行政人员而非外人的公司。您可以使用下一个代码
ExecutiveDirector <- df[Position == 'Executive Director', c(1,3)]
df4 <- merge(df3, ExecutiveDirector, by = 'Name', all.x = TRUE)
您将获得下一个数据框
Name Position Companies.x NoBoards Companies.y
1 Abbot, F Executive Director ARM 1 ARM
2 Abdool-Samad, T Outsider R 1 <NA>
3 Abedian, I Outsider FREIT 1 <NA>
4 Abrahams, F Executive Director, Outsider, Outsider FG, CG, LG 3 FG