如何将文件的多行读入数据帧的一行

时间:2013-01-29 12:02:43

标签: r file

我有一个数据文件,其中单个样本用空行分隔,每个字段都在它自己的行上:

age 20
weight 185
height 72

age 87
weight 109
height 60

age 15
weight 109
height 58

...

如何将此文件读入数据框,以便每行代表一个包含年龄,体重,身高等列的样本?

    age    weight    height

1   20      185        72  
2   87      109        60
3   15      109        58
...

6 个答案:

答案 0 :(得分:3)

@ user1317221_G展示了我将采取的方法,但是需要加载额外的包并显式生成组。组(ID变量)是获取任何reshape类型答案的关键。矩阵答案没有这个限制。

以下是基础R中密切相关的方法:

mydf <- read.table(header = FALSE, stringsAsFactors=FALSE, 
                   text = "age 20
                   weight 185
                   height 72

                   age 87
                   weight 109
                   height 60

                   age 15
                   weight 109
                   height 58
                   ")

# Create your id variable
mydf <- within(mydf, {
  id <- ave(V1, V1, FUN = seq_along)
})

使用id变量,您的转换很简单:

reshape(mydf, direction = "wide", 
        idvar = "id", timevar="V1")
#   id V2.age V2.weight V2.height
# 1  1     20       185        72
# 4  2     87       109        60
# 7  3     15       109        58

或者:

# Your ids become the "rownames" with this approach
as.data.frame.matrix(xtabs(V2 ~ id + V1, mydf))
#   age height weight
# 1  20     72    185
# 2  87     60    109
# 3  15     58    109

答案 1 :(得分:2)

要扩展@ BlueMagister的答案,您可以使用带有一些选项的扫描将其直接读入列表,然后将列表转换为数据框:

tmp <- scan(text = "
age     20
weight  185
height  72

age     87
weight  109
height  60

age     15
weight  109
height  58", multi.line=TRUE, 
  what=list('',0,'',0,'',0), 
  blank.lines.skip=TRUE)

mydf <- as.data.frame( tmp[ c(FALSE,TRUE) ] )
names(mydf) <- sapply( tmp[ c(TRUE,FALSE) ], '[', 1 )

这假定记录中的变量总是以相同的顺序。

答案 2 :(得分:1)

df <- read.table(text ="
age     1
weight  1
height  6

age     2
weight  7
height  2

age     4
weight  8
height  9", header=FALSE) 

df$ID <- rep(1:3, each=3)
library(reshape2)
newdf <- dcast(df, ID~V1, value.var="V2")

#     ID age height weight
#1  1   1      6      1
#2  2   2      2      7
#3  3   4      9      8

答案 3 :(得分:1)

其他解决方案

data <- readLines('c:\\relatorios\\bla.txt') # Read the data
data <- data[data != ''] # Remove the white lines
names <- unique(gsub('[0-9]*','',data)) # Get the names
data <- matrix(as.real(gsub('[^0-9]*','',data)),ncol=3,byrow=T) # Create matrix
colnames(data) <- names # Set the names

答案 4 :(得分:1)

以下是我尝试使用scan

的内容
##substitute text with file depending on your input
##read in three strings separated by spaces, multi-line input
y <- scan(text=x,what=list(character(),character(),character())
  ,sep="\n",multi.line=TRUE)
##combine into a matrix of strings
y <- do.call(cbind,y)
#     [,1]     [,2]         [,3]       
#[1,] "age 20" "weight 185" "height 72"
#[2,] "age 87" "weight 109" "height 60"
#[3,] "age 15" "weight 109" "height 58"
##set column names based on text from the first row
colnames(y) <- regmatches(y[1,],regexpr("^\\w+",y[1,]))
##remove non-numeric characters
y <- gsub("\\D+","",y)
##convert to number format, preserving matrix structure
y <- apply(y,2,as.numeric)
##convert to data frame (if necessary)
y <- data.frame(y)

答案 5 :(得分:0)

如果您的源文件一直有这三个变量,一种简单的方法就是以两个colun(首先是名字,第二个是数字)读取文件,然后将第二列变成矩阵。如果我从user1317221_G的回答中窃取df

matrix(df$V2,ncol=3,byrow=TRUE)
     [,1] [,2] [,3]
[1,]    1    1    6
[2,]    2    7    2
[3,]    4    8    9

添加行名和/或列名是微不足道的。抱歉获取列顺序“年龄,体重,身高”: - )