我有一个文件“ prf003.out”,
150 lines of blah....~tables that report other things in this text file deleted.....
Aboveground Live Belowground Forest Total Total Carbon
----------------- ----------------- Stand ------------------------- Stand Removed Released
YEAR Total Merch Live Dead Dead DDW Floor Shb/Hrb Carbon Carbon from Fire
--------------------------------------------------------------------------------------------------------------
2000 15.6 15.6 6.0 0.5 0.0 4.5 2.6 0.0 29.1 0.0 0.0
2001 15.6 15.6 6.0 0.4 0.0 4.2 2.5 0.0 28.7 0.0 0.0
2002 15.6 15.6 6.0 0.4 0.0 3.9 2.5 0.0 28.4 0.0 0.0
2003 15.6 15.6 6.0 0.4 0.0 3.7 2.5 0.0 28.1 0.0 0.0
2004 15.6 15.6 6.0 0.4 0.0 3.5 2.5 0.0 27.9 0.0 0.0
2005 16.6 16.6 6.0 1.0 1.3 3.6 2.5 0.0 30.9 0.0 0.0
2006 16.6 16.6 6.0 0.9 0.8 3.8 2.4 0.0 30.6 0.0 0.0
2007 16.6 16.6 6.0 0.9 0.6 3.8 2.4 0.0 30.3 0.0 0.0
2008 16.6 16.6 6.0 0.9 0.4 3.7 2.4 0.0 30.0 0.0 0.0
2009 16.6 16.6 6.0 0.8 0.2 3.7 2.4 0.0 29.8 0.0 0.0
2010 18.1 18.1 6.3 1.2 1.0 3.8 2.4 0.0 32.8 0.0 0.0
2011 18.1 18.1 6.3 1.1 0.6 4.0 2.4 0.0 32.5 0.0 0.0
2012 18.1 18.1 6.3 1.1 0.4 3.9 2.4 0.0 32.2 0.0 0.0
2013 18.1 18.1 6.3 1.0 0.3 3.9 2.4 0.0 31.9 0.0 0.0
2014 18.1 18.1 6.3 1.0 0.2 3.8 2.4 0.0 31.7 0.0 0.0
2015 19.1 19.1 6.5 1.4 1.1 3.9 2.4 0.0 34.3 0.0 0.0
2016 19.1 19.1 6.5 1.3 0.7 4.1 2.4 0.0 34.0 0.0 0.0
2017 19.1 19.1 6.5 1.3 0.5 4.0 2.4 0.0 33.8 0.0 0.0
2018 19.1 19.1 6.5 1.2 0.3 4.0 2.4 0.0 33.5 0.0 0.0
2019 19.1 19.1 6.5 1.2 0.2 3.9 2.4 0.0 33.2 0.0 0.0
2020 19.0 19.0 6.3 1.9 1.8 4.2 2.4 0.0 35.6 0.0 0.0
2021 19.0 19.0 6.3 1.8 1.3 4.5 2.4 0.0 35.3 0.0 0.0
2022 19.0 19.0 6.3 1.7 1.0 4.6 2.4 0.0 35.0 0.0 0.0
2023 19.0 19.0 6.3 1.6 0.7 4.6 2.4 0.0 34.7 0.0 0.0
2024 19.0 19.0 6.3 1.6 0.5 4.6 2.4 0.0 34.4 0.0 0.0
2025 19.0 19.0 6.3 2.2 2.0 4.9 2.4 0.0 36.7 0.0 0.0
2026 19.0 19.0 6.3 2.1 1.3 5.3 2.4 0.0 36.4 0.0 0.0
2027 19.0 19.0 6.3 2.0 1.0 5.4 2.4 0.0 36.0 0.0 0.0
2028 19.0 19.0 6.3 1.9 0.7 5.4 2.4 0.0 35.7 0.0 0.0
2029 19.0 19.0 6.3 1.9 0.5 5.4 2.4 0.0 35.4 0.0 0.0
2030 19.4 19.4 6.5 2.2 1.4 5.6 2.4 0.0 37.5 0.0 0.0
2031 19.4 19.4 6.5 2.1 0.8 5.9 2.4 0.0 37.2 0.0 0.0
2032 19.4 19.4 6.5 2.0 0.6 5.9 2.4 0.0 36.8 0.0 0.0
2033 19.4 19.4 6.5 1.9 0.4 5.8 2.4 0.0 36.5 0.0 0.0
2034 19.4 19.4 6.5 1.9 0.3 5.7 2.4 0.0 36.1 0.0 0.0
2035 18.6 18.6 6.3 2.6 2.1 6.0 2.4 0.0 38.0 0.0 0.0
2036 18.6 18.6 6.3 2.5 1.5 6.4 2.4 0.0 37.6 0.0 0.0
2037 18.6 18.6 6.3 2.4 1.1 6.4 2.4 0.0 37.2 0.0 0.0
2038 18.6 18.6 6.3 2.3 0.8 6.5 2.4 0.0 36.9 0.0 0.0
2039 18.6 18.6 6.3 2.2 0.6 6.5 2.4 0.0 36.5 0.0 0.0
2040 19.4 19.4 6.7 2.3 1.0 6.6 2.4 0.0 38.3 0.0 0.0
2041 19.4 19.4 6.7 2.2 0.6 6.6 2.4 0.0 38.0 0.0 0.0
2042 19.4 19.4 6.7 2.1 0.5 6.5 2.4 0.0 37.6 0.0 0.0
2043 19.4 19.4 6.7 2.0 0.4 6.4 2.4 0.0 37.3 0.0 0.0
2044 19.4 19.4 6.7 2.0 0.3 6.3 2.4 0.0 36.9 0.0 0.0
2045 17.9 17.9 6.3 2.8 2.5 6.6 2.4 0.0 38.5 0.0 0.0
2046 17.9 17.9 6.3 2.7 1.8 7.0 2.4 0.0 38.1 0.0 0.0
2047 17.9 17.9 6.3 2.6 1.4 7.1 2.4 0.0 37.7 0.0 0.0
2048 17.9 17.9 6.3 2.5 1.0 7.2 2.4 0.0 37.3 0.0 0.0
2049 17.9 17.9 6.3 2.4 0.7 7.2 2.4 0.0 36.9 0.0 0.0
blah.....a few more tables
我试图从中提取此特定表的。如您所见,顶部的“ blah”代表此.txt文件中生成的其他所有表。 之后,我在同一文件中输出了一堆其他表。
我试图做的事情与此问题类似,但是现在我陷入了困境:Extracting Data from Text Files
这是我所做的:
data <- readLines("prf003.out")
data
#VALUE=TRUE RETURNS EXACT MATCH OF TEXT.
cline <- grep("YEAR Total Merch Live Dead Dead DDW Floor Shb/Hrb Carbon Carbon from Fire", data, value= FALSE)
cline
#dont use str_extract, use str_extract_all
numstr <- sapply(str_extract_all(data[cline+1:51],"[0-9]"),as.numeric)
numstr
但是,我得到的输出很奇怪,并且没有按照我想要的方式格式化数据(即,只给我一个原始表的副本,以便我可以在R中处理它)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39]
[1,] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[2,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3
[4,] 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8
[5,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[6,] 5 5 5 5 5 6 6 6 6 6 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 8 8 8
[7,] 6 6 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 4 4 4 4 4 6 6 6 6
[8,] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[9,] 5 5 5 5 5 6 6 6 6 6 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 8 8 8
[10,] 6 6 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 4 4 4 4 4 6 6 6 6
[11,] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
[12,] 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 5 5 5 5 5 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 3 3 3 3
[13,] 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 2 2 1 1 2 2 2 2
[14,] 5 4 4 4 4 0 9 9 9 8 2 1 1 0 0 4 3 3 2 2 9 8 7 6 6 2 1 0 9 9 2 1 0 9 9 6 5 4 3
[15,] 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 2 1 1 0 0 1 0 0 0 0 2 1 1 0
[16,] 0 0 0 0 0 3 8 6 4 2 0 6 4 3 2 1 7 5 3 2 8 3 0 7 5 0 3 0 7 5 4 8 6 4 3 1 5 1 8
[17,] 4 4 3 3 3 3 3 3 3 3 3 4 3 3 3 3 4 4 4 3 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 6 6 6 6
[18,] 5 2 9 7 5 6 8 8 7 7 8 0 9 9 8 9 1 0 0 9 2 5 6 6 6 9 3 4 4 4 6 9 9 8 7 0 4 4 5
[19,] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[20,] 6 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[21,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[22,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[23,] 2 2 2 2 2 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[24,] 9 8 8 8 7 0 0 0 0 9 2 2 2 1 1 4 4 3 3 3 5 5 5 4 4 6 6 6 5 5 7 7 6 6 6 8 7 7 6
[25,] 1 7 4 1 9 9 6 3 0 8 8 5 2 9 7 3 0 8 5 2 6 3 0 7 4 7 4 0 7 4 5 2 8 5 1 0 6 2 9
[26,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[27,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[28,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[29,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
如您所见,它采用每个数字的每个值并将其放在新坐标中。我只想要原始表。
答案 0 :(得分:0)
这样的事情
# figure out where the headers are & where the data starts
dataHeader1 <- which(grepl("Aboveground", txtFile))
dataHeader2 <- dataHeader1 + 2
dataStart <- dataHeader2 + 2
# extract the data
txtDat <- txtFile[dataStart:length(txtFile)]
txtDat <- do.call(rbind, strsplit(txtDat, split = "\\s{1,}", perl = TRUE))
class(txtDat) <- "numeric"
txtDat
# returns
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 2000 15.6 15.6 6.0 0.5 0.0 4.5 2.6 0 29.1 0 0
[2,] 2001 15.6 15.6 6.0 0.4 0.0 4.2 2.5 0 28.7 0 0
[3,] 2002 15.6 15.6 6.0 0.4 0.0 3.9 2.5 0 28.4 0 0
[4,] 2003 15.6 15.6 6.0 0.4 0.0 3.7 2.5 0 28.1 0 0
[5,] 2004 15.6 15.6 6.0 0.4 0.0 3.5 2.5 0 27.9 0 0
[6,] 2005 16.6 16.6 6.0 1.0 1.3 3.6 2.5 0 30.9 0 0
....
请注意,您可以对regex
进行锐化以便确定数据的起始位置,例如。
dataHeader1 <- which(grepl("(?=.*Aboveground)(?=.*Carbon)", txtFile, perl = TRUE))
# this can be pursued arbitrarily
。我通过txtFile <- readLines("Path/To/test.txt")
读取数据,原始数据本身看起来像这样
[1] "asdsalkjdaskldas+"
[2] "jsafhnjadfnhdjkasfafdajfbnjasbfjads.kbnjdasnfadsnf"
[3] "45453342542542kj ijholijfkqaef45435314"
[4] ""
[5] "dasfjasikedfnha4454 "
[6] "a"
[7] "a"
[8] "fdgfd"
[9] "\t\t6546346343"
[10] ""
[11] ""
[12] " Aboveground Live Belowground Forest Total Total Carbon"
[13] " ----------------- ----------------- Stand ------------------------- Stand Removed Released"
[14] "YEAR Total Merch Live Dead Dead DDW Floor Shb/Hrb Carbon Carbon from Fire"
[15] "--------------------------------------------------------------------------------------------------------------"
[16] "2000 15.6 15.6 6.0 0.5 0.0 4.5 2.6 0.0 29.1 0.0 0.0"
...