假设我有以下数据框:
firstname <- c('Doug','Tom','Glenn','Billy','Angelo')
city <- c('Tulsa','Unknown','Miami','Houston','Unknown')
state <- c('OK','CA','FL','Unknown','Unknown')
job <- c('Unknown','Plumber','Professor','Unknown','Unknown')
list_test <- data.frame(firstname, city, state, job)
我想提取名字和列名称,其中一列是未知的。换句话说,我想要一个看起来像这样的表:
firstname attribute
Doug job
Tom city
Billy state
Billy job
Angelo city
Angelo state
Angelo job
答案 0 :(得分:1)
您可以遍历要处理的列的名称,构建一个包含缺少该属性的所有名字的数据框。然后,您可以将它们全部合并到do.call
和rbind
:
do.call(rbind, lapply(tail(names(list_test), -1), function(x) {
data.frame(firstname=list_test$firstname[list_test[,x] == "Unknown"], attribute=x)
}))
# firstname attribute
# 1 Tom city
# 2 Angelo city
# 3 Billy state
# 4 Angelo state
# 5 Doug job
# 6 Billy job
# 7 Angelo job
答案 1 :(得分:1)
library(reshape2)
library(dplyr)
list_test%>%melt(id.var='firstname',variable.name='attribute')
%>%filter(value=='Unknown')
%>%select(-3)
firstname attribute
1 Tom city
2 Angelo city
3 Billy state
4 Angelo state
5 Doug job
6 Billy job
7 Angelo job
答案 2 :(得分:1)
没有循环的解决方案;可能更适合更大的数据集。
library(reshape2)
#transform to long format
m_l <- melt(list_test,id = "firstname",factorsAsStrings=T)
#ignore warning; expected
#make selection
res <- m_l[m_l$value=="Unknown",-3]
#order (for completeness' sake)
> res[order(res$firstname),]
firstname variable
5 Angelo city
10 Angelo state
15 Angelo job
9 Billy state
14 Billy job
11 Doug job
2 Tom city
答案 3 :(得分:1)
使用tidyr
的{{1}}和基础R gather
的另一个简单选项
subset
答案 4 :(得分:1)
添加tidyr
和dplyr
解决方案。我发现它更加精致:
library(dplyr)
library(tidyr)
list_test %>%
gather(field, value, -firstname) %>%
filter(value == "Unknown") %>%
select(-value) %>%
arrange(firstname)
最后两行是相当美观的修正。您可以忽略有关删除属性的警告。它只是告诉你它将因子转换为字符向量。
答案 5 :(得分:0)
data.table
示例:
library(data.table)
list_test <- data.table(firstname, city, state, job)
varlist <- names(list_test)[2:4]
do.call(rbind,sapply(varlist, function(x) list_test[get(x)=='Unknown',list(firstname,col = x)], simplify=FALSE))
它有点乱 - 我希望有人可以提出更好的data.table
方法。