demodf <- data.frame(
name = c("Mike","Mike","Mike","Mike","Mike","Joe","Joe","Joe","Joe","Joe"),
Field = c("EDUCATION","Degree","Title","WORK", "Title", "EDUCATION","Degree","Title", "WORK","Title"),
Values = c("EDUCATION", "Masters", "Student", "WORK", "VP Sales", "EDUCATION", "Bachelors","Student", "WORK", "Analyst"))
name Field Values
1 Mike EDUCATION EDUCATION
2 Mike Degree Masters
3 Mike Title Student
4 Mike WORK WORK
5 Mike Title VP Sales
6 Joe EDUCATION EDUCATION
7 Joe Degree Bachelors
8 Joe Title Student
9 Joe WORK WORK
10 Joe Title Analyst
我希望tidyr::spread
或reshape2::dcast
采用宽格式,其中Field
成为列标题。
该代码看起来像dcast(demodf, name ~ Values)
或demodf %>% spread(Field, Values)
。但是,dcast
强制为数字,spread
会引发错误。
问题在于&#34;标题&#34;重复。您可以看到,由于数据中的怪癖,我们将教育和工作视为&#34; false&#34;数据中的标头。是否可以使用大写标题标记每个Field
条目,以便dcast
起作用(即Title_EDUCATION
和Title_WORK
)?最好将这种转变应用于整个Field
,所以&#34;教育&#34;和&#34;工作&#34;一起消失,我们离开了Degree_EDUCATION, TITLE_EDUCATION
......等等。)
请注意,实际数据中有更多标头,因此最好识别&#34; false标头&#34;作为全部条目条目,或Field == Values
期望的输出:
output <- data.frame(
Name=c("Mike", "Joe"),
Degree_EDUCATION =c("Masters", "Bachelors"),
Title_EDUCATION = c("Student", "Student"),
Title_WORK= c("VP Sales", "Analyst"))
Name Degree_EDUCATION Title_EDUCATION Title_WORK
1 Mike Masters Student VP Sales
2 Joe Bachelors Student Analyst
答案 0 :(得分:3)
关键是将重复的类别行添加为新列,然后您可以轻松使用它。
首先,添加stringsAsFactors=FALSE
以便比较Field
和Values
:
demodf <- data.frame(
name = c("Mike","Mike","Mike","Mike","Mike","Joe","Joe","Joe","Joe","Joe"),
Field = c("EDUCATION","Degree","Title","WORK", "Title", "EDUCATION","Degree","Title", "WORK","Title"),
Values = c("EDUCATION", "Masters", "Student", "WORK", "VP Sales", "EDUCATION", "Bachelors","Student", "WORK", "Analyst"),
stringsAsFactors=FALSE)
现在使用tidyr
和dplyr
添加列,如果该行是类别和该类别的名称,则填充缺少的值,然后删除额外的行和列。
library(tidyr)
library(dplyr)
d2 <- demodf %>% mutate(IsCategory=Field==Values,
Category=ifelse(IsCategory, Field, NA)) %>%
fill(Category) %>% subset(!IsCategory, select=-IsCategory)
d2
## name Field Values Category
## 2 Mike Degree Masters EDUCATION
## 3 Mike Title Student EDUCATION
## 5 Mike Title VP Sales WORK
## 7 Joe Degree Bachelors EDUCATION
## 8 Joe Title Student EDUCATION
## 10 Joe Title Analyst WORK
然后 dcast
将按您的希望工作!
library(reshape2)
dcast(d2, name ~ Field+Category, value.var="Values")
## name Degree_EDUCATION Title_EDUCATION Title_WORK
## 1 Joe Bachelors Student Analyst
## 2 Mike Masters Student VP Sales
答案 1 :(得分:0)
以下是data.table
的尝试。这要求使用stringsAsFactors = FALSE。
library(data.table)
# get groupings by titles (all caps)
setDT(demodf)[, head := cumsum(Field == toupper(Field))]
# merge titles onto full dataset and paste title to Field
demodf[demodf[Field == toupper(Field), .(Field, head)], on="head",
Field := paste(Field, i.Field, sep="_"), by=.EACHI]
# now reshape wide
dcast(demodf[Values != toupper(Values),], name~Field, value.var="Values")
返回
name Degree_EDUCATION Title_EDUCATION Title_WORK
1: Joe Bachelors Student Analyst
2: Mike Masters Student VP Sales
数据强>
demodf <-
structure(list(name = c("Mike", "Mike", "Mike", "Mike", "Mike",
"Joe", "Joe", "Joe", "Joe", "Joe"), Field = c("EDUCATION", "Degree",
"Title", "WORK", "Title", "EDUCATION", "Degree", "Title", "WORK",
"Title"), Values = c("EDUCATION", "Masters", "Student", "WORK",
"VP Sales", "EDUCATION", "Bachelors", "Student", "WORK", "Analyst"
)), .Names = c("name", "Field", "Values"), row.names = c(NA,
-10L), class = "data.frame")