使用sed从特定的分隔符之间获取文本

时间:2011-04-02 14:09:50

标签: sed awk

我第一次使用sed所以它甚至可能不是适合这项工作的工具,但经过广泛的谷歌搜索后,似乎可以完成这项工作。我的问题是我有一个数据文件,我需要从中提取一些数据并丢弃其余数据。

示例数据

    #Ranger (62:0)
Device pose: (0, 0, 0), (0, 0, 0)
Device size: (0, 0, 0)
Configuration: 
Minimum angle: 0    Maximum angle: 0    Angular resolution: 0
Minimum range: 0    Maximum range: 0Range resolution: 0
Scanning frequency: 0
682 range readings:
  [0.529, 0.524, 0.511, 0.506, 0.505, 0.505, 0.505, 0.505, 0.505, 0.503, 0.495, 0.483, 0.471, 0.469, 0.469, 0.469, 0.469, 0.465, 0.458, 0.458, 0.454, 0.454, 0.454, 0.45, 0.443, 0.442, 0.442, 0.443]
    #Ranger (62:0)
Device pose: (0, 0, 0), (0, 0, 0)
Device size: (0, 0, 0)
Configuration: 
Minimum angle: 0    Maximum angle: 0    Angular resolution: 0
Minimum range: 0    Maximum range: 0Range resolution: 0
Scanning frequency: 0
682 range readings:
  [0.529, 0.524, 0.511, 0.506, 0.505, 0.505, 0.505, 0.505, 0.505, 0.503, 0.495, 0.483, 0.471, 0.469, 0.469, 0.469, 0.469, 0.465, 0.458, 0.458, 0.454, 0.454, 0.454, 0.45, 0.443, 0.442, 0.442, 0.443]

这种模式重复了很多次。 我想要的数据在[和]之间。我想要所有[]对之间的所有数据。 我已经尝试了一些sed脚本,其中一个是作为一个非常类似的问题的解决方案发送但无济于事。脚本

sed -n -e '/\[[^]]/s/^[^[]*\[\([^]]*\)].*$/\1/p' <a.txt >b.txt

生成一个空的b.txt。然后我试了

sed -e '1s/#/<rem>\n&/g'  -e 's/\]/\n<rem>/g' -e 's/\[/<\/rem>\n/g' -e '/^$/d' -e 's/[ ]*//g' <a.txt > b.txt

这会生成包含<rem></rem>标记的漂亮分隔数据块

<rem>
#Ranger(62:0)
Devicepose:(0,0,0),(0,0,0)
Devicesize:(0,0,0)
Configuration:
Minimumangle:0  Maximumangle:0  Angularresolution:0
Minimumrange:0  Maximumrange:0Rangeresolution:0
Scanningfrequency:0
682rangereadings:
</rem>
0.529,0.524,0.511,0.506,0.505,0.505,0.505,0.505,0.505,0.503,0.495,0.483,0.471,0.469,0.469,0.469,0.469,0.465,0.458,0.458,0.454,0.454,0.454,0.45,0.443,0.442,0.442,0.443,0.451,0.459,0.459
<rem>
#Ranger(62:0)
Devicepose:(0,0,0),(0,0,0)
Devicesize:(0,0,0)

此后我尝试

sed -e '/<rem>/,/<\/rem>/d' <b.txt >c.txt

我得到了

#Ranger(62:0)
Devicepose:(0,0,0),(0,0,0)
Devicesize:(0,0,0)
Configuration:
Minimumangle:0  Maximumangle:0  Angularresolution:0
Minimumrange:0  Maximumrange:0Rangeresolution:0
Scanningfrequency:0
682rangereadings:
#Ranger(62:0)
Devicepose:(0,0,0),(0,0,0)
Devicesize:(0,0,0)

与我想要实现的完全相反。有人可以帮忙吗? 很抱歉有很长的解释。

3 个答案:

答案 0 :(得分:2)

也许这就是你想要的:

sed -nr 's/\s*\[([^\]+)\]/\1/p'

答案 1 :(得分:1)

使用grep

可以轻松完成此操作
user@machine:~/tmp$ grep '\[' data
  [0.529, 0.524, 0.511, 0.506, 0.505, 0.505, 0.505, 0.505, 0.505, 0.503, 0.495, 0.483, 0.471, 0.469, 0.469, 0.469, 0.469, 0.465, 0.458, 0.458, 0.454, 0.454, 0.454, 0.45, 0.443, 0.442, 0.442, 0.443]
  [0.529, 0.524, 0.511, 0.506, 0.505, 0.505, 0.505, 0.505, 0.505, 0.503, 0.495, 0.483, 0.471, 0.469, 0.469, 0.469, 0.469, 0.465, 0.458, 0.458, 0.454, 0.454, 0.454, 0.45, 0.443, 0.442, 0.442, 0.443]

如果要将输出存储在新文件中,只需重定向:

user@machine:~/tmp$ grep '\[' data > extracted_values.dat

答案 2 :(得分:1)

根据您的数据,您似乎只需要在[]中获取这些内容,而其他任何行都不包含[]。以下是其他一些方法

$ awk '/\[/{ gsub(/\[|\]/,"");print}' file

它说搜索[并打印该行,同时删除括号(如果需要括号,请删除gsub语句

$ ruby -ne 'print if /\[/' file 

$ ruby -ne 'puts $_.scan(/\[(.*?)\]/) if /\[/' file #no brackets
0.529, 0.524, 0.511, 0.506, 0.505, 0.505, 0.505, 0.505, 0.505, 0.503, 0.495, 0.483, 0.471, 0.469, 0.469, 0.469, 0.469, 0.465, 0.458, 0.458, 0.454, 0.454, 0.454, 0.45, 0.443, 0.442, 0.442, 0.443
0.529, 0.524, 0.511, 0.506, 0.505, 0.505, 0.505, 0.505, 0.505, 0.503, 0.495, 0.483, 0.471, 0.469, 0.469, 0.469, 0.469, 0.465, 0.458, 0.458, 0.454, 0.454, 0.454, 0.45, 0.443, 0.442, 0.442, 0.443

grep

$ grep "\[" file

如果你坚持sed

$ sed -n '/\[/p' file