编辑以使问题更清楚:我从一个凌乱的CSV文件开始,因此我需要确定ID是什么,时间变量是什么。或为变量列中的数据分配ID和时间。现在已经在下面回答了这个问题。 这是我的数据:
col1<-c("ID", "Date","var1","var2","ID","Date","var1","var2","ID","Date","var1","var2")
col2<-c("1","21-11-2015 14:20", "4.8","3.8", "1","21-11-2015 15:30", "3.5","5.9","2","21-11-2015 14:20","3.0","6.7")
df<-cbind(col1,col2)
我尝试过dcast()
却没有运气:
dcast(ID+Date~var1+var2, data = df, value.var = col1 )
我希望输出是像这样的真正的长格式:
ID<-c(1,1,2)
Date<-c("21-11-2015 14:20","21-11-2015 15:30","21-11-2015 14:20")
var1<-c("4.8","3.5","6.7")
var2<-c("3.8","5.9","3.0")
df.clean<-cbind(ID,Date, var1,var2)
感谢您的帮助。
答案 0 :(得分:5)
不要以为这是一个重塑问题,您可以在一列中包含值,而另一列中可以包含名称,可以将它们收集在一起并使用setNames
with(df, setNames(data.frame(matrix(col2,
ncol = length(unique(col1)), byrow = TRUE)), unique(col1)))
# ID Date var1 var2
#1 1 21-11-2015 14:20 4.8 3.8
#2 1 21-11-2015 15:30 3.5 5.9
#3 2 21-11-2015 14:20 3.0 6.7
数据
col1<-c("ID", "Date","var1","var2","ID","Date","var1","var2","ID",
"Date","var1","var2")
col2<-c("1","21-11-2015 14:20", "4.8","3.8", "1","21-11-2015 15:30",
"3.5","5.9","2","21-11-2015 14:20","3.0","6.7")
df<- data.frame(col1,col2)
答案 1 :(得分:1)
这不是重塑问题。在这里,我提供了有关如何手动执行的简单代码:
数据
col1<-c("ID",
"Date","var1","var2","ID","Date","var1","var2","ID","Date","var1","var2")
col2<-c("1","21-11-2015 14:20", "4.8","3.8", "1","21-11-2015 15:30",
"3.5","5.9","2","21-11-2015 14:20","3.0","6.7")
df<-data.frame(col1,col2, stringsAsFactors = F)
代码
uniquevars<-unique(col1)
Res<-list()
for(i in 1:length(uniquevars)){
Res[[uniquevars[i]]]<-df[,"col2"][which(df[,"col1"] ==uniquevars[i])]
}
dfRes <- data.frame(matrix(unlist(Res), ncol=length(Res)),stringsAsFactors=FALSE)
colnames(dfRes)<-uniquevars
dfRes
ID Date var1 var2
1 1 21-11-2015 14:20 4.8 3.8
2 1 21-11-2015 15:30 3.5 5.9
3 2 21-11-2015 14:20 3.0 6.7
我希望这段代码能使您理解对自己感兴趣的步骤。
干杯!
答案 2 :(得分:1)
这是一种tidyverse
的方法:
library(tidyverse)
df %>% # your original (cbind) object
data.frame() %>% # set as dataframe
group_by(col1) %>% # for each col1 value
mutate(index = row_number()) %>% # set a row index (useful for reshaping)
spread(col1, col2) %>% # reshape
select(-index) # remove index
# # A tibble: 3 x 4
# Date ID var1 var2
# <fct> <fct> <fct> <fct>
# 1 21-11-2015 14:20 1 4.8 3.8
# 2 21-11-2015 15:30 1 3.5 5.9
# 3 21-11-2015 14:20 2 3.0 6.7