我正在尝试将1个文件的某些行连接成1行,但它必须依赖于内容并且在整个文件中是可变的。
我的数据文件的简化版本:
>xy|number|Name
ABCABCABC
ABCABCABC
ABCABCABC
ABC
>xy|number2|Name2
ABCABCABC
ABCABC
>xy|number3|Name3
ABCABCABC
ABCABCABC
ABCABCABC
ABCAB
我希望它以这样的结尾:(空格意味着不同的列)
xy number Name ABCABCABCABCABCABCABCABCABCABC
xy number2 Name2 ABCABCABCABCABC
xy number3 Name3 ABCABCABCABCABCABCABCABCABCABCAB
答案 0 :(得分:4)
以下是@MatthewLundberg的类似解决方案,但使用cumsum
拆分矢量。
file<-scan('~/Desktop/data.txt','character')
h<-grepl('^>',file)
file[h]<-gsub('^>','',paste0(file[h],'|'),'')
l<-split(file,cumsum(h))
do.call(rbind,strsplit(sapply(l,paste,collapse=''),'[|]'))
# [,1] [,2] [,3] [,4]
# 1 "xy" "number" "Name" "ABCABCABCABCABCABCABCABCABCABC"
# 2 "xy" "number2" "Name2" "ABCABCABCABCABC"
# 3 "xy" "number3" "Name3" "ABCABCABCABCABCABCABCABCABCABCAB"
答案 1 :(得分:2)
dat <- read.table(file, header=FALSE)
h <- grep('^>', dat$V1)
m <- matrix(c(h, c(h[-1]-1, length(dat$V1))), ncol=2)
gsub('[|]', ' ',
sub('>', '',
apply(m, 1, function(x)
paste(dat$V1[x[1]], paste(dat$V1[(x[1]+1):x[2]], collapse=''))
)
)
)
## [1] "xy number Name ABCABCABCABCABCABCABCABCABCABC"
## [2] "xy number2 Name2 ABCABCABCABCABC"
## [3] "xy number3 Name3 ABCABCABCABCABCABCABCABCABCABCAB"
答案 2 :(得分:0)
如果您想要一个带有结果的data.frame,请考虑一下:
raw <- ">xy|number|Name
ABCABCABC
ABCABCABC
ABCABCABC
ABC
>xy|number2|Name2
ABCABCABC
ABCABC
>xy|number3|Name3
ABCABCABC
ABCABCABC
ABCABCABC
ABCAB"
s <- readLines(textConnection(raw)) # s is vector of strings
first.line <- which(substr(s,1,1) == ">") # find first line of set
N <- length(first.line)
first.line <- c(first.line, length(s)+1) # add first line past end
# Preallocate data.frame (good idea if large)
d <- data.frame(X1=rep("",N), X2=rep("",N), X3=rep("",N), X4=rep("",N),
stringsAsFactors=FALSE)
for (i in 1:N)
{
w <- unlist(strsplit(s[first.line[i]],">|\\|")) # Parse 1st line
d$X1[i] <- w[2]
d$X2[i] <- w[3]
d$X3[i] <- w[4]
d$X4[i] <- paste(s[ (first.line[i]+1) : (first.line[i+1]-1) ], collapse="")
}
d
X1 X2 X3 X4
1 xy number Name ABCABCABCABCABCABCABCABCABCABC
2 xy number2 Name2 ABCABCABCABCABC
3 xy number3 Name3 ABCABCABCABCABCABCABCABCABCABCAB
我希望默认情况下R左对齐字符串在data.frame中显示它们。