我有数据框
test <- structure(list(
y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"),
y2003 = c("freshman","junior","junior","sophomore","sophomore","senior"),
y2004 = c("junior","sophomore","sophomore","senior","senior",NA),
y2005 = c("senior","senior","senior",NA, NA, NA)),
.Names = c("2002","2003","2004","2005"),
row.names = c(c(1:6)),
class = "data.frame")
> test
2002 2003 2004 2005
1 freshman freshman junior senior
2 freshman junior sophomore senior
3 freshman junior sophomore senior
4 sophomore sophomore senior <NA>
5 sophomore sophomore senior <NA>
6 senior senior <NA> <NA>
我需要创建一个顶点/边缘列表(用于igraph),每次学生类别连续几年变化,而忽略没有变化时,如
testvertices <- structure(list(
vertex =
c("freshman","junior", "freshman","junior","sophomore","freshman",
"junior","sophomore","sophomore","sophomore"),
edge =
c("junior","senior","junior","sophomore","senior","junior",
"sophomore","senior","senior","senior"),
id =
c("1","1","2","2","2","3","3","3","4","5")),
.Names = c("vertex","edge", "id"),
row.names = c(1:10),
class = "data.frame")
> testvertices
vertex edge id
1 freshman junior 1
2 junior senior 1
3 freshman junior 2
4 junior sophomore 2
5 sophomore senior 2
6 freshman junior 3
7 junior sophomore 3
8 sophomore senior 3
9 sophomore senior 4
10 sophomore senior 5
此时我忽略了ID,我的图表应按重量计算边缘(即新生 - >初级= 3)。想法是制作一个树形图。我知道这是在主要的调整点旁边,但是如果你问的话......那就好了。
答案 0 :(得分:3)
如果我理解正确,你需要这样的东西:
elist <- lapply(seq_len(nrow(test)), function(i) {
x <- as.character(test[i,])
x <- unique(na.omit(x))
x <- rep(x, each=2)
x <- x[-1]
x <- x[-length(x)]
r <- matrix(x, ncol=2, byrow=TRUE)
if (nrow(r) > 0) { r <- cbind(r, i) } else { r <- cbind(r, numeric()) }
r
})
do.call(rbind, elist)
# i
# [1,] "freshman" "junior" "1"
# [2,] "junior" "senior" "1"
# [3,] "freshman" "junior" "2"
# [4,] "junior" "sophomore" "2"
# [5,] "sophomore" "senior" "2"
# [6,] "freshman" "junior" "3"
# [7,] "junior" "sophomore" "3"
# [8,] "sophomore" "senior" "3"
# [9,] "sophomore" "senior" "4"
#[10,] "sophomore" "senior" "5"
这不是最有效的解决方案,但我认为这是相当有说服力的。我们为输入矩阵的每一行单独创建边,因此lapply
。要从行创建边,我们首先删除NAs和重复,然后包括每个顶点两次。最后,我们删除第一个和最后一个顶点。这样我们就创建了一个边缘列表矩阵,我们只需要删除第一个和最后一个顶点并将其格式化为两列(实际上将它留作矢量会更有效率,更别提了。)
添加额外列时,我们必须小心检查边列表矩阵是否为零行。
do.call
功能只会将所有内容粘合在一起。结果是一个矩阵,您可以通过as.data.frame()
将其转换为数据框,然后您还可以将第三列转换为数字。如果您愿意,也可以更改列名称。
答案 1 :(得分:1)
这是你想要的吗...
test1<-c(test[[2]],test[[3]],test[[4]])
test2<-c(test[[3]],test[[4]],test[[5]])
df<-data.frame(vertex=test1,edge=test2)
df1<-df[complete.cases(df),]
result<-df1[df1$vertex != df1$edge,]