Question

我试图将以下.csv文件读入R中。从下面的想象中可以看出，第2行具有唯一的变量名称，而第3行具有上述变量的值。所以第2/3行一起代表一个观察。此过程继续，因此第4行是变量名称行，第5行对应于那些变量值。继续该过程，使得每对两行（2 / 3,4 / 5,6 / 7 ...... 999/1000）代表一个观察。数据集中共有1,000个观测值。

我遇到的问题是将其读入R中，以便我有更多可用的数据集。我的目标是在顶行有一组标准的变量名，每一行代表一次观察。

任何专家R编码员都有建议吗？

谢谢，

CSV Image

Answer 1

这是一个针对我所做的简单测试案例的解决方案。您需要将数据导入data.frame x = read.csv(file="your-file.csv")

为了测试这个，我使用了测试data.frame，x：

x=structure(list(V1 = structure(c(2L, 1L, 4L, 3L), .Label = c("1", 
"a", "ab", "h"), class = "factor"), V2 = structure(c(2L, 1L, 
4L, 3L), .Label = c("2", "b", "cd", "i"), class = "factor"), 
    V3 = structure(c(3L, 1L, 2L, 4L), .Label = c("3", "a", "c", 
    "ef"), class = "factor"), V4 = structure(c(3L, 1L, 2L, 4L
    ), .Label = c("4", "b", "d", "gh"), class = "factor"), V5 = structure(c(3L, 
    1L, 2L, 4L), .Label = c("5", "c", "e", "ij"), class = "factor"), 
    V6 = structure(c(3L, 1L, 2L, 4L), .Label = c("6", "d", "f", 
    "kl"), class = "factor"), V7 = structure(c(3L, 1L, 2L, 4L
    ), .Label = c("7", "e", "g", "mno"), class = "factor")), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7"), class = "data.frame", row.names = c(NA, 
-4L))

这会转动此表格（第1行和第3行是您的标签）：

  V1 V2 V3 V4 V5 V6  V7
1  a  b  c  d  e  f   g
2  1  2  3  4  5  6   7
3  h  i  a  b  c  d   e
4 ab cd ef gh ij kl mno

使用此脚本生成最终的data.frame dat：

library(plyr)
variables = x[seq(1,nrow(x),2),] #df of all variable rows
values = x[seq(2,nrow(x),2),] #df of all value rows
dat=data.frame() #generate blank data.frame
for(i in 1:nrow(variables)) {
    dat.temp=data.frame(values[i,])#make temporary df for the row i of your values
    colnames(dat.temp)=as.matrix(variables[i,]) # name the temporary df from row i of your variables
    print(dat.temp) #check that they are coming out right (comment this out as necessary)
    dat=rbind.fill(dat,dat.temp) #create the final data.frame
    rm(dat.temp) #remove the temporary df
}

进入最终表（变量现在是列名）：

   a  b  c  d   e    f    g    h    i
1  1  2  3  4   5    6    7 <NA> <NA>
2 ef gh ij kl mno <NA> <NA>   ab   cd

希望它有效。

读取复杂的CSV到R

1 个答案: