我有这样的数据:
ID=c(rep("ID1",3), rep("ID2",2), "ID3", rep("ID4",2))
sex=c(rep("male",3), rep("female",2), "female", rep("male",2))
item=c("a","b","c","a","c","a","b","a")
df1 <- data.frame(ID,sex,item)
df1
ID sex item
1 ID1 male a
2 ID1 male b
3 ID1 male c
4 ID2 female a
5 ID2 female c
6 ID3 female a
7 ID4 male b
8 ID4 male a
我需要它像这样的边缘:
head(nodes)
ID sex V1 V2
1 ID1 male a b
2 ID1 male b c
3 ID1 male a c
4 ID2 female a c
5 ID4 male b a
借助@ akrun的帮助,我可以获得V1和V2专栏:
lst <- lapply(split(item, DG), function(x) if(length(x) >=2) t(combn(x,2)) else NULL)
nodes=as.data.frame(do.call(rbind,lst[!sapply(lst, is.null)]) )
但我怎么能&#34;带着&#34;身份证和其他一些变量(性别,年龄等)来自原始df并将其作为&#34;性别&#34;等节点&#34;节点&#34;?
答案 0 :(得分:3)
我觉得这已经解决了一次,但是这是一个使用data.table
的可能解决方案,它是新的(v >= 1.9.5)tstrsplit
函数
library(data.table)
setDT(df1)[, if(.N > 1) tstrsplit(combn(as.character(item),
2, paste, collapse = ";"), ";"),
.(ID, sex)]
# ID sex V1 V2
# 1: ID1 male a b
# 2: ID1 male a c
# 3: ID1 male b c
# 4: ID2 female a c
# 5: ID4 male b a
答案 1 :(得分:2)
尝试
res <- do.call(rbind,lapply(split(df1, df1$ID), function(x) {
m1 <- if(length(x$item)>=2)
t(combn(as.character(x$item),2))
else NULL
if(!is.null(m1))
data.frame(ID=unique(x$ID), sex=unique(x$sex), m1)}))
row.names(res) <- NULL
res
# ID sex X1 X2
#1 ID1 male a b
#2 ID1 male a c
#3 ID1 male b c
#4 ID2 female a c
#5 ID4 male b a