我的数据是:
Name House Street Apt City Postal Phone
Bob Joe 954 BLUE DRIVE NA A PLACE Z5K4N2 999-495-6544
Smith Jane 555 BLUE DRIVE NA A PLACE Z5K4N5 999-435-6172
Smith Jane 555 BLUE DRIVE NA A PLACE Z5K4N5 999-450-6763
我正在尝试比较Names(动态,数据按House排序),如果相等AND house#相等,则连接相应的两个电话号码并删除未连接的行。
所以看起来像这样:
Name House Street Apt City Postal Phone
Bob Joe 954 BLUE DRIVE NA A PLACE Z5K4N2 999-495-6544
Smith Jane 555 BLUE DRIVE NA A PLACE Z5K4N5 999-435-6172 OR 999-450-6763
我的尝试:
for(x in 1:nrow(data)) {
if(data$Name[x] == data$Name[x+1]) {
data$NameDupes <- data$Name[x] }
}
然后使用
aggregate: aggregate(Phone ~ Name + Street + City + Postal + Apt + House, data = df, paste, collapse = " OR ")
然后在我原来的df上使用连接。
对想法持开放态度
由于
答案 0 :(得分:2)
来自dplyr
的解决方案。
library(dplyr)
dt2 <- dt %>%
group_by(House, Street, Apt, City, Postal) %>%
summarise(Name = first(Name), Phone = paste(Phone, collapse = " OR ")) %>%
ungroup() %>%
arrange(desc(House)) %>%
select(colnames(dt))
dt2
# A tibble: 2 x 7
Name House Street Apt City Postal Phone
<chr> <int> <chr> <lgl> <chr> <chr> <chr>
1 Bob Joe 954 BLUE DRIVE NA A PLACE Z5K4N2 999-495-6544
2 Smith Jane 555 BLUE DRIVE NA A PLACE Z5K4N5 999-435-6172 OR 999-450-6763
数据强>
dt <- read.table(text = "Name House Street Apt City Postal Phone
'Bob Joe' 954 'BLUE DRIVE' NA 'A PLACE' Z5K4N2 '999-495-6544'
'Smith Jane' 555 'BLUE DRIVE' NA 'A PLACE' Z5K4N5 '999-435-6172'
'Smith Jane' 555 'BLUE DRIVE' NA 'A PLACE' Z5K4N5 '999-450-6763'",
header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:0)
与使用data.table的@ycw ...不同的答案。 (因为我是该软件包的个人粉丝)。
使用数据
dt <- read.table(text = "Name House Street Apt City Postal Phone
'Bob Joe' 954 'BLUE DRIVE' NA 'A PLACE' Z5K4N2 '999-495-6544'
'Smith Jane' 555 'BLUE DRIVE' NA 'A PLACE' Z5K4N5 '999-435-6172'
'Smith Jane' 555 'BLUE DRIVE' NA 'A PLACE' Z5K4N5 '999-450-6763'",
header = TRUE, stringsAsFactors = FALSE)
我们执行一个伟大的单行
library(data.table)
dt = as.data.table(dt)
dt[,.(Phone = paste(Phone,collapse = " OR ")),by = .(Name,House,Street,Apt,City,Postal)]
输出
Name House Street Apt City Postal Phone
1: Bob Joe 954 BLUE DRIVE NA A PLACE Z5K4N2 999-495-6544
2: Smith Jane 555 BLUE DRIVE NA A PLACE Z5K4N5 999-435-6172 OR 999-450-6763