根据条件提取列名称和特定值

时间:2015-08-21 16:25:49

标签: r dataframe

假设我有以下数据框:

firstname <- c('Doug','Tom','Glenn','Billy','Angelo')
city <- c('Tulsa','Unknown','Miami','Houston','Unknown')
state <- c('OK','CA','FL','Unknown','Unknown')
job <- c('Unknown','Plumber','Professor','Unknown','Unknown')

list_test <- data.frame(firstname, city, state, job)

我想提取名字列名称,其中一列是未知的。换句话说,我想要一个看起来像这样的表:

  firstname  attribute
       Doug        job
        Tom       city
      Billy      state
      Billy        job
     Angelo       city
     Angelo      state
     Angelo        job

6 个答案:

答案 0 :(得分:1)

您可以遍历要处理的列的名称,构建一个包含缺少该属性的所有名字的数据框。然后,您可以将它们全部合并到do.callrbind

do.call(rbind, lapply(tail(names(list_test), -1), function(x) {
  data.frame(firstname=list_test$firstname[list_test[,x] == "Unknown"], attribute=x)
}))
#   firstname attribute
# 1       Tom      city
# 2    Angelo      city
# 3     Billy     state
# 4    Angelo     state
# 5      Doug       job
# 6     Billy       job
# 7    Angelo       job

答案 1 :(得分:1)

library(reshape2)
library(dplyr)
list_test%>%melt(id.var='firstname',variable.name='attribute')
          %>%filter(value=='Unknown')
          %>%select(-3)


firstname attribute
1       Tom      city
2    Angelo      city
3     Billy     state
4    Angelo     state
5      Doug       job
6     Billy       job
7    Angelo       job

答案 2 :(得分:1)

没有循环的解决方案;可能更适合更大的数据集。

library(reshape2)

#transform to long format
m_l <- melt(list_test,id = "firstname",factorsAsStrings=T)
#ignore warning; expected

#make selection
res <- m_l[m_l$value=="Unknown",-3]
#order (for completeness' sake)

> res[order(res$firstname),]
   firstname variable
5     Angelo     city
10    Angelo    state
15    Angelo      job
9      Billy    state
14     Billy      job
11      Doug      job
2        Tom     city

答案 3 :(得分:1)

使用tidyr的{​​{1}}和基础R gather的另一个简单选项

subset

答案 4 :(得分:1)

添加tidyrdplyr解决方案。我发现它更加精致:

library(dplyr)
library(tidyr)

list_test %>% 
  gather(field, value, -firstname) %>% 
  filter(value == "Unknown") %>% 
  select(-value) %>% 
  arrange(firstname)

最后两行是相当美观的修正。您可以忽略有关删除属性的警告。它只是告诉你它将因子转换为字符向量。

答案 5 :(得分:0)

data.table示例:

library(data.table)
list_test <- data.table(firstname, city, state, job)
varlist <- names(list_test)[2:4]
do.call(rbind,sapply(varlist, function(x) list_test[get(x)=='Unknown',list(firstname,col = x)], simplify=FALSE))

它有点乱 - 我希望有人可以提出更好的data.table方法。