我有一个如下数据集:
structure(list(Info = c("Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042 ",
"Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148 ",
"Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173 ",
"Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349 ",
"Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497 ",
"Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153 "
)), .Names = "Info", row.names = c(NA, 6L), class = "data.frame")
它目前只有一列,看起来像这样
Info
1 Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042
2 Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148
3 Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173
4 Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349
5 Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497
6 Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153
我希望它有7列,看起来像这样:
Species V1 V2 V3 V4 V5 V6
1 Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042
2 Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148
3 Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173
4 Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349
5 Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497
6 Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153
这个问题给我带来了麻烦,因为物种名称并不总是两个字。原始文本文件没有分隔,因此我无法以分隔的方式读取它。我只能将其作为一个列字符串获取。有人有什么建议吗?
答案 0 :(得分:6)
尝试使用gsub
在"信息"中的每个数字前面加逗号。我们假设的数据帧的列被命名为" dat"然后用read.csv重新阅读:
> read.csv(text=gsub("( [-[:digit:].])", ",\\1", dat$Info), header=FALSE)
V1 V2 V3 V4 V5 V6 V7
1 Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042
2 Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148
3 Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173
4 Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349
5 Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497
6 Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153
我感谢您描述您的用例。我可能会在将来自己使用它。
答案 1 :(得分:4)
假设ds
是您的数据:
ds <-
structure(list(Info = c("Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042 ",
"Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148 ",
"Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173 ",
"Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349 ",
"Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497 ",
"Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153 "
)), .Names = "Info", row.names = c(NA, 6L), class = "data.frame")
然后您可以执行类似
的操作ds$Info <- gsub(" (-?[0-9])", ", \\1", ds$Info)
do.call(rbind, strsplit(ds$Info, ", "))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "Acacia melanoceras" "0.0369" "0.0427" "0.0267" "0.0298" "0.0501" "0.0042 "
#[2,] "Acalypha diversifolia van" "0.0670" "0.0439" "0.0281" "0.0427" "0.0464" "-0.0148 "
#[3,] "Acalypha macrostachya vin" "0.0657" "0.0621" "0.0441" "0.0522" "0.0473" "-0.0173 "
#[4,] "Adelia triloba" "0.0481" "0.0350" "0.0202" "0.0174" "0.0286" "-0.0349 "
#[5,] "Aegiphila panamensis" "0.0437" "0.0312" "0.0166" "0.0148" "0.0194" "-0.0497 "
#[6,] "Alchornea costaricensis" "0.0568" "0.0781" "0.0502" "0.0221" "0.0734" "-0.0153 "
其中ds
是您上面的数据,您几乎已经完成了。首先查找空格后跟数字并输入逗号。然后我们分割字符串并组合向量。然后,您可以将对象转换为data.frame
,将相关列转换为numeric
,然后添加colnames
。
编辑:
正如BondedDust的回答所示,使用read.csv
会更优雅。
read.csv(text = ds$Info, header = FALSE)
答案 2 :(得分:1)
这是我的建议:
1)按' '
拆分,
2)将物种和属名称粘贴在一起(我假设你有6个数字列)和
3)制作(字符)data.frame。
4)最后将列转换为数字和
5)将Species
设置为colname。
df <- structure(list(Info = c("Acacia melanoceras 0.0369 0.0427 0.0267 0.0298 0.0501 0.0042 ",
"Acalypha diversifolia van 0.0670 0.0439 0.0281 0.0427 0.0464 -0.0148 ",
"Acalypha macrostachya vin 0.0657 0.0621 0.0441 0.0522 0.0473 -0.0173 ",
"Adelia triloba 0.0481 0.0350 0.0202 0.0174 0.0286 -0.0349 ",
"Aegiphila panamensis 0.0437 0.0312 0.0166 0.0148 0.0194 -0.0497 ",
"Alchornea costaricensis 0.0568 0.0781 0.0502 0.0221 0.0734 -0.0153 "
)), .Names = "Info", row.names = c(NA, 6L), class = "data.frame")
df
# split
sp <- strsplit(df$Info, ' ')
sp
# make (character) data.frame
require(plyr)
newdf <- ldply(sp, function(x) {
l <- length(x)
dta <- x[(l-5):l]
spec <- paste(x[1:(l-6)], collapse = ' ')
out <- c(spec, dta)
return(out)
})
# make numeric cols
newdf[ , 2:7] <- apply(newdf[ , 2:7], 2, function(x) as.numeric(x))
names(newdf)[1] <- 'Species'
str(newdf)