<blocks with =“”angle =“”bracket =“”> via egrep </blocks>之间的正则表达式

时间:2014-03-03 23:09:30

标签: regex perl grep

鉴于以下数据,我如何提取物理块尖括号之间的数字?

原始数据:

"6917: <physical-blocks> 573653840</physical-blocks>"
"8954: <physical-blocks>573653841</physical-blocks>"
"8991: <physical-blocks>573653842</physical-blocks>"
"9028: <physical-blocks>573653843</physical-blocks>"
"9065: <physical-blocks>573653844</physical-blocks>"
"9102: <physical-blocks>573653845</physical-blocks>"

所需的输出(数组):

573653840 573653841 573653842 573653843 573653844 573653845 

我只是希望能够在<physical-blocks></physical-blocks>之间提取数据。注意;完整的数据集包含许多带尖括号的字符串 - 我特别需要这组特定字符串之间的数据。

3 个答案:

答案 0 :(得分:0)

awk版本

awk '{sub(/[^>]*>/,"");sub(/<.*/,"");$1=$1}1' file
573653840
573653841
573653842
573653843
573653844
573653845

答案 1 :(得分:0)

使用GNU awk

gawk 'RT=="</physical-blocks>"' RS='</?physical-blocks>' ORS=' ' file

如果您想在输出后输入换行符,请参阅下面的内容:

$ cat file
"6917: <physical-blocks>573653840</physical-blocks>"
"8954: <physical-blocks>573653841</physical-blocks>"
"8991: <physical-blocks>573653842</physical-blocks>"
"9028: <physical-blocks>573653843</physical-blocks>"
"9065: <physical-blocks>573653844</physical-blocks>"
"9102: <physical-blocks>573653845</physical-blocks>"

$ gawk 'RT=="</physical-blocks>";END{print "\n"}' RS='</?physical-blocks>' ORS=' ' file
573653840 573653841 573653842 573653843 573653844 573653845

答案 2 :(得分:-1)

你可以使用简单的前瞻和外观:

(?<=\>)(\s*)(\d*)(?=\<)