Question

我需要使用文件中的sed获取'[['和']]'之间的字符串：response.txt

x-content-type-options: nosniff
x-server-response-time: 63
x-dropbox-request-id: 84e52618f83eda15cb6d96eb4f601f45
pragma: no-cache
cache-control: no-cache
x-dropbox-http-protocol: None
x-frame-options: SAMEORIGIN

{"has_more": false, "cursor": "AAEynx2q5KMgkcOwL2dKZ4MCYxNTtsdA950A5kYOdjWFln_RYuAokMnJCOb85B7idOHjycS8LJye3BhWfezTkkoprVxhgMNni_Bg04A-JO9fLmqIGO3CYInBQPmNUXL57S32ECWwA-CYu1CiLi5ujTDz", "entries": [["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]], "reset": true}

并希望使用命令sed来获取字符串，结果如下：

[["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]]

我在终端运行命令：

$ sed -n 's/.*"entries": *$\[\[.*\]\]$/\1/p' /tmp/response.txt

得到结果：

[["/test", {"rev": "b1e9026cf6f4", "thumb_exists": false, "path": "/TEST", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 05:53:27 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45545}], ["/TEST/test-file-01", {"rev": "b1ed026cf6f4", "thumb_exists": false, "path": "/test/test-file-01", "is_dir": true, "icon": "folder", "read_only": false, "modifier": null, "bytes": 0, "modified": "Fri, 22 May 2015 06:15:33 +0000", "size": "0 bytes", "root": "dropbox", "revision": 45549}]], "reset": true}

然后，我在终端运行命令：

$ sed -n 's/.*"entries": *$\[\[(?!\]\].)*\]\]$/\1/p' /tmp/response.txt

什么也不返回。

好像我写了错误的正则表达式？我能怎么做？谢谢！

Answer 1

避免使用正则表达式解析JSON。使用适当的解析器。

如果您安装了jq：

awk -v RS="" "END {print}" response.txt | jq -c '.["entries"]'

[["/test",{"revision":45545,"root":"dropbox","size":"0 bytes","modified":"Fri, 22 May 2015 05:53:27 +0000","rev":"b1e9026cf6f4","thumb_exists":false,"path":"/TEST","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0}],["/TEST/test-file-01",{"revision":45549,"root":"dropbox","size":"0 bytes","modified":"Fri, 22 May 2015 06:15:33 +0000","rev":"b1ed026cf6f4","thumb_exists":false,"path":"/test/test-file-01","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0}]]

或红宝石：

ruby -rjson -e '
    data = (File.readlines(ARGV.shift))[-1]
    json = JSON.parse(data)
    puts JSON.generate(json["entries"])
' response.txt

[["/test",{"rev":"b1e9026cf6f4","thumb_exists":false,"path":"/TEST","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0,"modified":"Fri, 22 May 2015 05:53:27 +0000","size":"0 bytes","root":"dropbox","revision":45545}],["/TEST/test-file-01",{"rev":"b1ed026cf6f4","thumb_exists":false,"path":"/test/test-file-01","is_dir":true,"icon":"folder","read_only":false,"modifier":null,"bytes":0,"modified":"Fri, 22 May 2015 06:15:33 +0000","size":"0 bytes","root":"dropbox","revision":45549}]]

或您选择的任何实现JSON解析器的语言。

Answer 2

这可能适合你（GNU sed）：

sed '/\n/!{s/\[\[/\n&/g;s/\]\]/&\n/g};/^\[\[/P;D' file

如果模式空间不包含\n，则将\n添加到所有[[字符串，并将\n附加到所有]]字符串。如果模式空间以[[开头，则打印到以下\n（或模式空间的末尾）。删除到下一个\n（或模式空间的末尾）并重复，直到模式空间为空。

N.B。这只会在以所需字符串开头和结尾的换行符之间打印字符串（[[或]]）。

Answer 3

sed识别Posix正则表达式，它不包括像(?!这样的外观断言。

幸运的是，为这个简单的案例写一个正则表达式很容易（像往常一样，它不太容易阅读）：

sed -n 's/.*"entries": *\(\[\[\(]\?[^]]\)*]]\)/\1/p' /tmp/response.txt

然而，它不是贪婪的匹配，导致你的初始尝试的问题。问题是你不能丢弃比赛后的线路内容。你想要的是：

sed -n 's/.*"entries": *\(\[\[\(]\?[^]]\)*]]\).*/\1/p' /tmp/response.txt

sed使用＆＃34;基本＆＃34; Posix regexes（BREs）意味着你最终会得到很多反斜杠。我试图删除至少其中一些，使用] 特殊的正则表达式，除非它正在关闭一个字符类。但总的来说，我认为使用grep可以更好地满足您的需求，grep -oE '"entries": \[\[(]?[^]])*]]' /tmp/response.txt | cut -d ' ' -f2-具有使用Posix标准选项＆＃34;扩展＆＃34; （正常）正则表达式（ERE），以及只打印匹配字符串的选项：

cut

（最后"entries":将删除\[\[ match [[ ( ]? possibly a single ] [^]] anything but a ] )* repeated as many times as necessary ]] match ]]）

正则表达式的解释

正则表达式（在ERE表格中）包括：

重复的小组会匹配]，然后是一个]，或者匹配]]以外的任何内容。实际上，它（几乎）是对]的否定。

（这并不是否定，因为它在字符串的末尾不会与单个]]匹配，但这并不重要因为我们在这里坚持要求关闭server { listen 80 default_server; listen [::]:80 default_server ipv6only=on; root /usr/share/nginx/html/hd; index index.php index.html index.htm; server_name localhost; location / { try_files $uri $uri/ /index.php$is_args$args; } rewrite ^themes/.*/(layouts|pages|partials)/.*.htm /index.php break; rewrite ^bootstrap/.* /index.php break; rewrite ^config/.* /index.php break; rewrite ^vendor/.* /index.php break; rewrite ^storage/cms/.* /index.php break; rewrite ^storage/logs/.* /index.php break; rewrite ^storage/framework/.* /index.php break; rewrite ^storage/temp/protected/.* /index.php break; rewrite ^storage/app/uploads/protected/.* /index.php break; location ~ \.php$ { try_files $uri =404; fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } }，所以它到达字符串末尾的情况不会发生。）

Answer 4

尝试：

sed -n 's/.*"entries": *\(\[\[.*\]\]\).*/\1/p'

（请注意模式末尾的.*）。

一个正则表达式，用于排除特定字符串'[['via sed

4 个答案:

正则表达式的解释