鉴于以下数据,我如何提取物理块尖括号之间的数字?
原始数据:
"6917: <physical-blocks> 573653840</physical-blocks>"
"8954: <physical-blocks>573653841</physical-blocks>"
"8991: <physical-blocks>573653842</physical-blocks>"
"9028: <physical-blocks>573653843</physical-blocks>"
"9065: <physical-blocks>573653844</physical-blocks>"
"9102: <physical-blocks>573653845</physical-blocks>"
所需的输出(数组):
573653840 573653841 573653842 573653843 573653844 573653845
我只是希望能够在<physical-blocks>
和</physical-blocks>
之间提取数据。注意;完整的数据集包含许多带尖括号的字符串 - 我特别需要这组特定字符串之间的数据。
答案 0 :(得分:0)
awk
版本
awk '{sub(/[^>]*>/,"");sub(/<.*/,"");$1=$1}1' file
573653840
573653841
573653842
573653843
573653844
573653845
答案 1 :(得分:0)
使用GNU awk
:
gawk 'RT=="</physical-blocks>"' RS='</?physical-blocks>' ORS=' ' file
如果您想在输出后输入换行符,请参阅下面的内容:
$ cat file
"6917: <physical-blocks>573653840</physical-blocks>"
"8954: <physical-blocks>573653841</physical-blocks>"
"8991: <physical-blocks>573653842</physical-blocks>"
"9028: <physical-blocks>573653843</physical-blocks>"
"9065: <physical-blocks>573653844</physical-blocks>"
"9102: <physical-blocks>573653845</physical-blocks>"
$ gawk 'RT=="</physical-blocks>";END{print "\n"}' RS='</?physical-blocks>' ORS=' ' file
573653840 573653841 573653842 573653843 573653844 573653845
答案 2 :(得分:-1)
你可以使用简单的前瞻和外观:
(?<=\>)(\s*)(\d*)(?=\<)