我有数据框
test <- structure(list(
y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"),
y2003 = c("freshman","junior","junior","sophomore","sophomore","senior"),
y2004 = c("junior","sophomore","sophomore","senior","senior",NA),
y2005 = c("senior","senior","senior",NA, NA, NA)),
.Names = c("2002","2003","2004","2005"),
row.names = c(c(1:6)),
class = "data.frame")
> test
2002 2003 2004 2005
1 freshman freshman junior senior
2 freshman junior sophomore senior
3 freshman junior sophomore senior
4 sophomore sophomore senior <NA>
5 sophomore sophomore senior <NA>
6 senior senior <NA> <NA>
我想挖掘数据以获得每行的单独步骤,如
result <- structure(list(
y2002 = c("freshman","freshman","freshman","sophomore","sophomore","senior"),
y2003 = c("junior","junior","junior","senior","senior",NA),
y2004 = c("senior","sophomore","sophomore",NA,NA,NA),
y2005 = c(NA,"senior","senior",NA, NA, NA)),
.Names = c("1","2","3","4"),
row.names = c(c(1:6)),
class = "data.frame")
> result
1 2 3 4
1 freshman junior senior <NA>
2 freshman junior sophomore senior
3 freshman junior sophomore senior
4 sophomore senior <NA> <NA>
5 sophomore senior <NA> <NA>
6 senior <NA> <NA> <NA>
我知道如果我将每一行视为一个向量,我可以做类似
的事情careerrow <- c(1,2,3,3,4)
pairz <- lapply(careerrow,function(i){c(careerrow[i],careerrow[i+1])})
uniquepairz <- careerrow[sapply(pairz,function(x){x[1]!=x[2]})]
我的难点是将该行应用于我的数据表。我认为lapply是要走的路,但到目前为止我无法解决这个问题。
答案 0 :(得分:3)
如果您的目标是计算每个路径的总数
你可以使用这样的东西(使用data.table
,因为它将列表作为data.table(data.frame-like)对象中的元素处理的好方法。
我使用!duplicated(...)
删除重复项,因为这比唯一更有效。
library(data.table)
library(reshape2)
# make the rownames a column
test$id <- rownames(test)
# put in long format
DT <- as.data.table(melt(test,id='id'))
# get the unique steps and concatenate into a unique identifier for each pathway
DL <- DT[!is.na(value), {.steps <- value[!duplicated(value)]
stepid <- paste(.steps, sep ='.',collapse = '.')
list(steps = list(.steps), stepid =stepid)}, by=id]
## id steps stepid
## 1: 1 freshman,junior,senior freshman.junior.senior
## 2: 2 freshman,junior,sophomore,senior freshman.junior.sophomore.senior
## 3: 3 freshman,junior,sophomore,senior freshman.junior.sophomore.senior
## 4: 4 sophomore,senior sophomore.senior
## 5: 5 sophomore,senior sophomore.senior
## 6: 6 senior senior
# count the number per path
DL[, .N, by = stepid]
## stepid N
## 1: freshman.junior.senior 1
## 2: freshman.junior.sophomore.senior 2
## 3: sophomore.senior 2
## 4: senior 1
答案 1 :(得分:2)
lapply
对其列进行操作。那是因为data.frame是一个列表,其元素是列。您可以将lapply
与apply
:
MARGIN=1
unique.padded <- function(x) {
uniq <- unique(x)
out <- c(uniq, rep(NA, length(x) - length(uniq)))
}
t(apply(test, 1, unique.padded))
# [,1] [,2] [,3] [,4]
# 1 "freshman" "junior" "senior" NA
# 2 "freshman" "junior" "sophomore" "senior"
# 3 "freshman" "junior" "sophomore" "senior"
# 4 "sophomore" "senior" NA NA
# 5 "sophomore" "senior" NA NA
# 6 "senior" NA NA NA
编辑:我看到了您对最终目标的评论。我会做这样的事情:
table(sapply(apply(test, 1, function(x)unique(na.omit(x))),
paste, collapse = "_"))
# freshman_junior_senior freshman_junior_sophomore_senior
# 1 2
# senior sophomore_senior
# 1 2