示例数据
sessionid qf Office
12 3 LON1,LON2,LON1,SEA2,SEA3,SEA3,SEA3
12 4 DEL2,DEL1,LON1,DEL1
13 5 MAn1,LON1,DEL1,LON1
这里我想删除列#34; OFFICE"中的重复值。每一行。
预期产出
sessionid qf Office
12 3 LON1,LON2,SEA2,SEA3
12 4 DEL2,DEL1,LON1
13 5 MAN1,LON1,DEL1
答案 0 :(得分:2)
我们可以使用tidyverse
。通过分隔符拆分“Office”并展开为“long”格式,然后获取distinct
行,按'sessionid'分组,'qf',paste
'Office'的内容
library(tidyverse)
separate_rows(df1, Office) %>%
distinct() %>%
group_by(sessionid, qf) %>%
summarise(Office = toString(Office))
# A tibble: 3 x 3
# Groups: sessionid [?]
# sessionid qf Office
# <int> <int> <chr>
#1 12 3 LON1, LON2, SEA2, SEA3
#2 12 4 DEL2, DEL1, LON1
#3 13 5 MAn1, LON1, DEL1
答案 1 :(得分:2)
这是一种基本的R方式,它按照您的预期工作,首先用逗号分割Office,删除重复项,然后再粘贴在一起
df$Office <- sapply(lapply(strsplit(df$Office, ","),
function(x) {
unique(x)
}),
function(x) {
paste(x, collapse = ",")
},
simplify = T)
或%>%
df$Office <- df$Office %>%
strsplit(",") %>%
lapply(function(x){unique(x)}) %>%
sapply(function(x){paste(x,collapse = ",")},simplify = T)