我有一个用R中的readLines读取的文件。在索引sndx和endx之间是一个用空格分隔的数字表。我想将它们转换成矩阵。例如,玩具示例文件为:
======
3 5 # this is how I know sndx and endx
Some text
1 123. 456. 789.
2 345. 678. 123.
3 235. 123. 345.
More text
======
所需的输出将是矩阵:
1 123. 456. 789.
2 345. 678. 123.
3 235. 123. 345.
有没有办法以这种方式提取数字线?
答案 0 :(得分:0)
示例:
"Some text
endx
1 123. 456. 789.
2 345. 678. 123.
3 235. 123. 345.
sndx
More text"
使用strsplit
:
char_vec <- trimws(readClipboard())
# Need the string after 'endx'
str_start <- grep('endx', char_vec)+1
# And the string before 'sndx'
str_end <- grep('sndx', char_vec)-1
# The output here is a matrix but we need the transpose of the output
t(sapply(str_start:str_end, function(z){
u <- char_vec[z]
ret <- strsplit(x = gsub('\\.', "", u), split = '[[:space:]]{1,5}')[[1]]
return(ret)
}))
输出:
> t(sapply(str_start:str_end, function(z){
+ u <- char_vec[z]
+ ret <- strsplit(x = gsub('\\.', "", u), split = '[[:space:]]{1,5}')[[1]]
+ return(ret)
+ }))
[,1] [,2] [,3] [,4]
[1,] "1" "123" "456" "789"
[2,] "2" "345" "678" "123"
[3,] "3" "235" "123" "345"