sed命令问题

时间:2015-01-28 10:33:58

标签: regex shell sed

我遇到了一个奇怪的问题。我想在两个字符串之间提取内容。文件的结构几乎相同,但大小不同。 我使用的命令适用于一个文件temp,但不适用于另一个文件tmp2

-bash-3.2# cat temp
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-22T13:38:04Z</dp:timestamp><dp:file name="temporary://test.txt">XJzLXJlc3VsdHMtYWN0aW9uX18i</dp:file><dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:file></dp:response></env:Body></env:Envelope>

以下命令产生预期输出

-bash-3.2# sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://test.txt">\([^>]*\)</dp:file>\(.*\)_\2_p' temp

XJzLXJlc3VsdHMtYWN0aW9uX18i

-bash-3.2# sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://test1.txt">\([^>]*\)</dp:file>\(.*\)_\2_p' temp

lc3VsdHMtYWN0aW9uX18i

-bash-3.2# cat tmp2
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/"><env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-
27T11:10:38Z</dp:timestamp><dp:file name="temporary://BackUpDir/backupmanifest.xml">PFNlY3VyZUJhY2t1cE1hbmlmZXN0Pg0KPGJhY2t1cG1hbmlmZXN0Pg0KIDx2ZXJzaW9uPlhJNTAuNi4wLjAu
MTwvdmVyc2lvbj4NCiA8dGltZXpvbmU+R01UMEJTVDwvdGltZXpvbmU+DQogPGNvbmZpZz5hdXRvY29uZmlnLmNmZzwvY29uZmlnPg0KIDx0aW1lPjIwMTUtMDEtMjdUMTE6MDI6NTZaPC90aW1lPg0KIDxidWlsZD4yMzI3
Nzc8L2J1aWxkPg0KIDxidWlsZGRhdGU+MjAxMy8wOC8wMSAxOTo0MzozNjwvYnVpbGRkYXRlPg0KIDxjb21tb25jcml0ZXJpYT5vZmY8L2NvbW1vbmNyaXRlcmlhPg0KIDxzZXJpYWxudW1iZXI+NjhBNTkyNjwvc2VyaWFs
bnVtYmVyPg0KIDxjcnlwdG9DZXJ0aWZpY2F0ZT5zZWN1cmVfYmFja3VwPEZpbGVOYW1lPmNlcnQ6Ly8vc2VjdXJlX2JhY2t1cC1zc2NlcnQucGVtPC9GaWxlTmFtZT48L2NyeXB0b0NlcnRpZmljYXRlPg0KIDxlcGhlbWVy
Ukhwc3Bxb0V0YlU0SDBtOVkNCkJzVHEwRFhiTUk4WGNMc1NiUGc5WktRdlBzY2Y5Q0sxRDhwdUJjODM0akNOaDJCQnhlWWdMTzhnUWg5NXVjNHENCjVtMTlWNnhNYVBPNnpZZkM5Tk1XQmR5MVhIWDhwc2txdTVJeGdnSm5N
SUlDWXdJQkFUQm5NR0l4Q3pBSkJnTlYNCkJBWVRBbFZUTVF3d0NnWURWUVFLRXdOSlFrMHhKekFsQmdOVkJBc1RIbGRsWWxOd2FHVnlaU0JFWVhSaFVHOTMNClpYSWdRWEJ3YkdsaGJtTmxjekVjTUJvR0ExVUVBeE1UUTNW
emRHOXRaWElnVW1Wc1pXRnpaU0JEUVFJQkZqQUgNCkJnVXJEZ01DR3FDQjJEQVlCZ2txaGtpRzl3MEJDUU14Q3dZSktvWklodmNOQVFjQk1Cd0dDU3FHU0liM0RRRUoNCkJURVBGdzB4TlRBeE1qY3hNVEF5TlRaYU1DTUdD
U3FHU0liM0RRRUpCREVXQkJRbDc1cUJ3MWlWRHhkN0NjY1gNCjZ0UlNoVUJLblRCNUJna3Foa2lHOXcwQkNROHhiREJxTUFzR0NXQ0dTQUZsQXdRQktqQUxCZ2xnaGtnQlpRTUUNCkFSWXdDd1lKWUlaSUFXVURCQUVDTUFv
R0NDcUdTSWIzRFFNSE1BNEdDQ3FHU0liM0RRTUNBZ0lBZ0RBTkJnZ3ENCmhraUc5dzBEQWdJQlFEQUhCZ1VyRGdNQ0J6QU5CZ2dxaGtpRzl3MERBZ0lCS0RBTkJna3Foa2lHOXcwQkFRRUYNCkFBU0NBUUF0NldRM2lzeExU
WFA4S2FyaThhOVZQUlVIeFgza3U4ZHNvOVk3dVBjMmdaZHZNWHZJWEhXL3RhR0oNCk8wdjBRdm54OHpOdU5NTnpOMjdUalVhN1E2NUt5OXJrVllJRHY4aGdOM2NwemhLZmI2N0plQ0s5S1NjMVllQTMNCmY3TTdhUXcrV0ps
WlpSTXVlZ2ZDK1BpMFNxZ1dXUTNVY1BIQlZvMFAzUDBRcXd2Mk1lQWJUZ1ROa1FMWm9pcU8NCkR4cVEvTjNaMzZrN25ORW85MUMvdks0SytmaklRWXplU09YbThJemd0NjlKd1BvYlhoUFhHZjBCRDNzUVVwTUENCm9QZ3E1
WExXM2lzMi9pamd4RVA1a1ZQR2E5dFNPd1dEYkJ1RzBNTDNkVkhsQ2lidndBSkdyTVlWR3l2Q2o4UHANCmx1WmpFdWk3cEhkV2laSGZWSGlXajdHY3Z3SVUNCjwvc2lnbmF0dXJlPg0KPC9TZWN1cmVCYWNrdXBNYW5pZmVz
dD4NCg==</dp:file><dp:file name="temporary://BackUpDir/cert.tgz">p6605/jI2ntpNM2jt0L0el8aq/fo+9OD2NsmfEF+P+whGQ/V1Bv94ph4FLcSm520piXl9krMYlwYnnWQl9uDNi25EIENdLHUHsnQFyJ
ykYN4k2YwpZJRIp8M6cYQX1fEzfdW2rpZrvprgT85ncSrVZC66oTxE37qZxqPyJJAHfOTld0hYt2</dp:file></dp:response></env:Body></env:Envelope>

而下面的命令没有产生任何输出。我希望它能在temporary://BackUpDir/backupmanifest.xml">和首次出现</dp:file>

之间打印内容
sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://BackUpDir/backupmanifest.xml">\([^>]*\)</dp:file>\(.*\)_\2_p' tmp2

我在哪里犯这个错误?很抱歉粘贴文件的大量内容,但我看不到在这里附加文件的选项。

2 个答案:

答案 0 :(得分:4)

最好的办法是使用解析器。我可以举例。使用-N添加命名空间,-v添加表达式,例如:

xmlstarlet sel \
  -N 'dp=http://www.datapower.com/schemas/management' \
  -t \
  -v '//dp:file/text()' \
temp

产量:

XJzLXJlc3VsdHMtYWN0aW9uX18i
lc3VsdHMtYWN0aW9uX18i

与第二个相同:

xmlstarlet sel \
  -N 'dp=http://www.datapower.com/schemas/management' \
  -t \
  -v '//dp:file/text()' \
temp2

产量:

PFNlY3VyZUJhY2t1cE1hbmlmZXN0Pg0KPGJhY2t1cG1hbmlmZXN0Pg0KIDx2ZXJzaW9uPlhJNTAuNi4wLjAu
MTwvdmVyc2lvbj4NCiA8dGltZXpvbmU+R01UMEJTVDwvdGltZXpvbmU+DQogPGNvbmZpZz5hdXRvY29uZmlnLmNmZzwvY29uZmlnPg0KIDx0aW1lPjIwMTUtMDEtMjdUMTE6MDI6NTZaPC90aW1lPg0KIDxidWlsZD4yMzI3
Nzc8L2J1aWxkPg0KIDxidWlsZGRhdGU+MjAxMy8wOC8wMSAxOTo0MzozNjwvYnVpbGRkYXRlPg0KIDxjb21tb25jcml0ZXJpYT5vZmY8L2NvbW1vbmNyaXRlcmlhPg0KIDxzZXJpYWxudW1iZXI+NjhBNTkyNjwvc2VyaWFs
bnVtYmVyPg0KIDxjcnlwdG9DZXJ0aWZpY2F0ZT5zZWN1cmVfYmFja3VwPEZpbGVOYW1lPmNlcnQ6Ly8vc2VjdXJlX2JhY2t1cC1zc2NlcnQucGVtPC9GaWxlTmFtZT48L2NyeXB0b0NlcnRpZmljYXRlPg0KIDxlcGhlbWVy
Ukhwc3Bxb0V0YlU0SDBtOVkNCkJzVHEwRFhiTUk4WGNMc1NiUGc5WktRdlBzY2Y5Q0sxRDhwdUJjODM0akNOaDJCQnhlWWdMTzhnUWg5NXVjNHENCjVtMTlWNnhNYVBPNnpZZkM5Tk1XQmR5MVhIWDhwc2txdTVJeGdnSm5N
SUlDWXdJQkFUQm5NR0l4Q3pBSkJnTlYNCkJBWVRBbFZUTVF3d0NnWURWUVFLRXdOSlFrMHhKekFsQmdOVkJBc1RIbGRsWWxOd2FHVnlaU0JFWVhSaFVHOTMNClpYSWdRWEJ3YkdsaGJtTmxjekVjTUJvR0ExVUVBeE1UUTNW
emRHOXRaWElnVW1Wc1pXRnpaU0JEUVFJQkZqQUgNCkJnVXJEZ01DR3FDQjJEQVlCZ2txaGtpRzl3MEJDUU14Q3dZSktvWklodmNOQVFjQk1Cd0dDU3FHU0liM0RRRUoNCkJURVBGdzB4TlRBeE1qY3hNVEF5TlRaYU1DTUdD
U3FHU0liM0RRRUpCREVXQkJRbDc1cUJ3MWlWRHhkN0NjY1gNCjZ0UlNoVUJLblRCNUJna3Foa2lHOXcwQkNROHhiREJxTUFzR0NXQ0dTQUZsQXdRQktqQUxCZ2xnaGtnQlpRTUUNCkFSWXdDd1lKWUlaSUFXVURCQUVDTUFv
R0NDcUdTSWIzRFFNSE1BNEdDQ3FHU0liM0RRTUNBZ0lBZ0RBTkJnZ3ENCmhraUc5dzBEQWdJQlFEQUhCZ1VyRGdNQ0J6QU5CZ2dxaGtpRzl3MERBZ0lCS0RBTkJna3Foa2lHOXcwQkFRRUYNCkFBU0NBUUF0NldRM2lzeExU
WFA4S2FyaThhOVZQUlVIeFgza3U4ZHNvOVk3dVBjMmdaZHZNWHZJWEhXL3RhR0oNCk8wdjBRdm54OHpOdU5NTnpOMjdUalVhN1E2NUt5OXJrVllJRHY4aGdOM2NwemhLZmI2N0plQ0s5S1NjMVllQTMNCmY3TTdhUXcrV0ps
WlpSTXVlZ2ZDK1BpMFNxZ1dXUTNVY1BIQlZvMFAzUDBRcXd2Mk1lQWJUZ1ROa1FMWm9pcU8NCkR4cVEvTjNaMzZrN25ORW85MUMvdks0SytmaklRWXplU09YbThJemd0NjlKd1BvYlhoUFhHZjBCRDNzUVVwTUENCm9QZ3E1
WExXM2lzMi9pamd4RVA1a1ZQR2E5dFNPd1dEYkJ1RzBNTDNkVkhsQ2lidndBSkdyTVlWR3l2Q2o4UHANCmx1WmpFdWk3cEhkV2laSGZWSGlXajdHY3Z3SVUNCjwvc2lnbmF0dXJlPg0KPC9TZWN1cmVCYWNrdXBNYW5pZmVz
dD4NCg==
p6605/jI2ntpNM2jt0L0el8aq/fo+9OD2NsmfEF+P+whGQ/V1Bv94ph4FLcSm520piXl9krMYlwYnnWQl9uDNi25EIENdLHUHsnQFyJ
ykYN4k2YwpZJRIp8

答案 1 :(得分:2)

根据我的评论(和另一个),你的sed命令似乎没有任何问题(虽然是,可能不是长期最易维护/可读的解决方案),它似乎是一个换行问题。

所以要在一行上验证它:

wc -l tmp2

应该生成1 ...并确保无论如何:

tr -d '\n' tmp2 | sed -n ...