我试图将以下.csv文件读入R中。从下面的想象中可以看出,第2行具有唯一的变量名称,而第3行具有上述变量的值。所以第2/3行一起代表一个观察。此过程继续,因此第4行是变量名称行,第5行对应于那些变量值。继续该过程,使得每对两行(2 / 3,4 / 5,6 / 7 ...... 999/1000)代表一个观察。数据集中共有1,000个观测值。
我遇到的问题是将其读入R中,以便我有更多可用的数据集。我的目标是在顶行有一组标准的变量名,每一行代表一次观察。
任何专家R编码员都有建议吗?
谢谢,
答案 0 :(得分:0)
这是一个针对我所做的简单测试案例的解决方案。您需要将数据导入data.frame x = read.csv(file="your-file.csv")
为了测试这个,我使用了测试data.frame,x:
x=structure(list(V1 = structure(c(2L, 1L, 4L, 3L), .Label = c("1",
"a", "ab", "h"), class = "factor"), V2 = structure(c(2L, 1L,
4L, 3L), .Label = c("2", "b", "cd", "i"), class = "factor"),
V3 = structure(c(3L, 1L, 2L, 4L), .Label = c("3", "a", "c",
"ef"), class = "factor"), V4 = structure(c(3L, 1L, 2L, 4L
), .Label = c("4", "b", "d", "gh"), class = "factor"), V5 = structure(c(3L,
1L, 2L, 4L), .Label = c("5", "c", "e", "ij"), class = "factor"),
V6 = structure(c(3L, 1L, 2L, 4L), .Label = c("6", "d", "f",
"kl"), class = "factor"), V7 = structure(c(3L, 1L, 2L, 4L
), .Label = c("7", "e", "g", "mno"), class = "factor")), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7"), class = "data.frame", row.names = c(NA,
-4L))
这会转动此表格(第1行和第3行是您的标签):
V1 V2 V3 V4 V5 V6 V7
1 a b c d e f g
2 1 2 3 4 5 6 7
3 h i a b c d e
4 ab cd ef gh ij kl mno
使用此脚本生成最终的data.frame dat
:
library(plyr)
variables = x[seq(1,nrow(x),2),] #df of all variable rows
values = x[seq(2,nrow(x),2),] #df of all value rows
dat=data.frame() #generate blank data.frame
for(i in 1:nrow(variables)) {
dat.temp=data.frame(values[i,])#make temporary df for the row i of your values
colnames(dat.temp)=as.matrix(variables[i,]) # name the temporary df from row i of your variables
print(dat.temp) #check that they are coming out right (comment this out as necessary)
dat=rbind.fill(dat,dat.temp) #create the final data.frame
rm(dat.temp) #remove the temporary df
}
进入最终表(变量现在是列名):
a b c d e f g h i
1 1 2 3 4 5 6 7 <NA> <NA>
2 ef gh ij kl mno <NA> <NA> ab cd
希望它有效。