我有一个数据文件,其中单个样本用空行分隔,每个字段都在它自己的行上:
age 20
weight 185
height 72
age 87
weight 109
height 60
age 15
weight 109
height 58
...
如何将此文件读入数据框,以便每行代表一个包含年龄,体重,身高等列的样本?
age weight height
1 20 185 72
2 87 109 60
3 15 109 58
...
答案 0 :(得分:3)
@ user1317221_G展示了我将采取的方法,但是需要加载额外的包并显式生成组。组(ID变量)是获取任何reshape
类型答案的关键。矩阵答案没有这个限制。
以下是基础R中密切相关的方法:
mydf <- read.table(header = FALSE, stringsAsFactors=FALSE,
text = "age 20
weight 185
height 72
age 87
weight 109
height 60
age 15
weight 109
height 58
")
# Create your id variable
mydf <- within(mydf, {
id <- ave(V1, V1, FUN = seq_along)
})
使用id变量,您的转换很简单:
reshape(mydf, direction = "wide",
idvar = "id", timevar="V1")
# id V2.age V2.weight V2.height
# 1 1 20 185 72
# 4 2 87 109 60
# 7 3 15 109 58
或者:
# Your ids become the "rownames" with this approach
as.data.frame.matrix(xtabs(V2 ~ id + V1, mydf))
# age height weight
# 1 20 72 185
# 2 87 60 109
# 3 15 58 109
答案 1 :(得分:2)
要扩展@ BlueMagister的答案,您可以使用带有一些选项的扫描将其直接读入列表,然后将列表转换为数据框:
tmp <- scan(text = "
age 20
weight 185
height 72
age 87
weight 109
height 60
age 15
weight 109
height 58", multi.line=TRUE,
what=list('',0,'',0,'',0),
blank.lines.skip=TRUE)
mydf <- as.data.frame( tmp[ c(FALSE,TRUE) ] )
names(mydf) <- sapply( tmp[ c(TRUE,FALSE) ], '[', 1 )
这假定记录中的变量总是以相同的顺序。
答案 2 :(得分:1)
df <- read.table(text ="
age 1
weight 1
height 6
age 2
weight 7
height 2
age 4
weight 8
height 9", header=FALSE)
df$ID <- rep(1:3, each=3)
library(reshape2)
newdf <- dcast(df, ID~V1, value.var="V2")
# ID age height weight
#1 1 1 6 1
#2 2 2 2 7
#3 3 4 9 8
答案 3 :(得分:1)
其他解决方案
data <- readLines('c:\\relatorios\\bla.txt') # Read the data
data <- data[data != ''] # Remove the white lines
names <- unique(gsub('[0-9]*','',data)) # Get the names
data <- matrix(as.real(gsub('[^0-9]*','',data)),ncol=3,byrow=T) # Create matrix
colnames(data) <- names # Set the names
答案 4 :(得分:1)
以下是我尝试使用scan
:
##substitute text with file depending on your input
##read in three strings separated by spaces, multi-line input
y <- scan(text=x,what=list(character(),character(),character())
,sep="\n",multi.line=TRUE)
##combine into a matrix of strings
y <- do.call(cbind,y)
# [,1] [,2] [,3]
#[1,] "age 20" "weight 185" "height 72"
#[2,] "age 87" "weight 109" "height 60"
#[3,] "age 15" "weight 109" "height 58"
##set column names based on text from the first row
colnames(y) <- regmatches(y[1,],regexpr("^\\w+",y[1,]))
##remove non-numeric characters
y <- gsub("\\D+","",y)
##convert to number format, preserving matrix structure
y <- apply(y,2,as.numeric)
##convert to data frame (if necessary)
y <- data.frame(y)
答案 5 :(得分:0)
如果您的源文件一直有这三个变量,一种简单的方法就是以两个colun(首先是名字,第二个是数字)读取文件,然后将第二列变成矩阵。如果我从user1317221_G的回答中窃取df
,
matrix(df$V2,ncol=3,byrow=TRUE)
[,1] [,2] [,3]
[1,] 1 1 6
[2,] 2 7 2
[3,] 4 8 9
添加行名和/或列名是微不足道的。抱歉获取列顺序“年龄,体重,身高”: - )