我一直在尝试编译将从以下数据中提取的表达式:
[4] "00010131 DistanceToPith=15.0; YearsToPith=3; Radius=50.128; CalcRadius=Yes; "
[5] "00010131 PithCoordinates=60.919,6.071; SiteId=KO31; "
[6] "00010131 Location=Djerdap, GJ \"Kožica\" odeljenje 31; State=Srbija; "
[7] "00010131 SpeciesCode=QUPE; SpeciesName=Kitnjak, Quercus petrea; "
[8] "00010131 Personal_ID=Marko Kazimirovic; DateOfSampling=jesen 2013; "
[9] "00010131 Name=00010131; Written=2018-05-04 16:53:09; "
[10] "00010131 EarthCoord=E 44 35 N 21 58; Elev=450-465; "
[11] "00010131 Project=Radakovicev magistarski; "
[12] "00010132 DistanceToPith=6.7; YearsToPith=3; Radius=104.927; CalcRadius=Yes; "
[13] "00010132 PithCoordinates=108.974,27.022; Written=2018-05-04 17:09:35; "
[14] "00010132 SiteId=KO31; Location=Djerdap, GJ \"Kožica\" odeljenje 31; "
[15] "00010132 EarthCoord=E 44 35 N 21 58; Elev=450-465; State=Srbija; "
[16] "00010132 SpeciesCode=QUPE; SpeciesName=Kitnjak, Quercus petrea; "
[17] "00010132 Project=Radakovicev magistarski; Personal_ID=Marko Kazimirovic; "
[18] "00010132 DateOfSampling=jesen 2013; Name=00010132; "
仅前八个数字
(^\\d{8}), (YearsToPith=\\d+;)) and (Radius=\\d+;)
,仅此而已。
除此之外,我正在寻找建议和链接,以找到用于R中正则表达式的全面文献,因为我所阅读的手册范围非常狭窄,仅限于使用非常简单的示例来解释基础知识。
答案 0 :(得分:1)
library(stringr)
res<-sapply(str_split(s," "),"[")[c(2,4,5)]
str_remove_all(res,"(\\D(?=\\d{4,}))")
[1] "00010131" "YearsToPith=3;" "Radius=50.128;"
减去单词:
res1<-str_remove_all(res,"(\\D(?=\\d{4,}))")
str_remove(res1,"\\w{3,}=")
[1] "00010131" "3;" "50.128;"
仅前八个数字:
str_extract_all(s,"\\d{8,}(?=\\s)")
[[1]]
[1] "00010131"
答案 1 :(得分:1)
在gregexpr
和pattern
中具有捕获组的perl=TRUE
的输出具有属性"capture.start"
和"capture.length"
,这些属性为您提供输入的索引符合您的模式。
ind <- lapply(gregexpr(PATTERN, INPUT, perl=TRUE),
function(r) rbind(attr(r,"capture.start"),
attr(r,"capture.length")-1))
OUPTUT <- t(sapply(1:length(input), function(i)
apply(ind[[i]],2, function(y) substr(input[i],y[1],y[1]+y[2]))))
根据您的输入
PATTERN <- "^(\\d{8}).*(YearsToPith=\\d+;).*(Radius=[\\d\\.]+;)"
输出为:
[1,] "00010131" "YearsToPith=3;" "Radius=50.128;"
[2,] "" "" ""
[3,] "" "" ""
[4,] "" "" ""
[5,] "" "" ""
[6,] "" "" ""
[7,] "" "" ""
[8,] "" "" ""
[9,] "00010132" "YearsToPith=3;" "Radius=104.927;"
[10,] "" "" ""
[11,] "" "" ""
[12,] "" "" ""
[13,] "" "" ""
[14,] "" "" ""
[15,] "" "" ""
答案 2 :(得分:0)
我知道您需要八位数字的ID,然后是包含YearsToPith和Radius值的文本。
尝试使用此Perl
$ perl -ne ' @x=$_=~m/\S+\s"(\S+)\s+.+?YearsToPith=(\d+).+\s+Radius=(\S+)/ ; print "$x[0] $x[1] $x[2]\n" if (@x) ' marko.txt
00010131 3 50.128;
00010132 3 104.927;
$