我试图从像
这样的大字符串中提取少量信息[[["좋은","good","joh-eun",""]],[["adjective",[["좋은",["good","nice","pretty","admirable","canny","tenacious"],,0.38553435]],"good",4],["adverb",["훌륭하게",["wonderfully","good","nicely","beautifully","fine","finely"],,0.00029145498],"good",4]]]
我想像这样提取字符串
좋은 - good
좋은 - good,nice,pretty,admirable,canny,tenacious (basically adjectives)
훌륭하게 - wonderfully,good,nicely,beautifully,fine,finely (adverbs)
请帮助我尝试使用sed和pipe切割像
cut --delimiter='"' -f 1-2 and then use sed 's/\[\[\[\"//'
结果我给了我第一个韩语좋은,我无法扩展这个以获得理想的结果! 如果有任何其他更好的方法来实现这一点,请建议。 提前谢谢。
答案 0 :(得分:2)
有点晚了但是纯正的正则表达适合sed:
正则表达式:\[\[\["(.*?)","(.*?)"\]\],\[\["(.*?)",\[\["(.*?)",\["(.*?)"\],.*?\]\],.*?\],\["(.*?)",\["(.*?)",\["(.*)"\],.*\]\]\]
替换:\1 - \2\n\4 - \5 (\3)\n\7 - \8 (\6)
假设orignal line中总是有形容词和副词括号......(即使是空的)
请参阅演示中的替换以了解如何重新匹配。
答案 1 :(得分:1)
这是一块红宝石,但可能任何配备PCRE的工具都可以做类似的事情:
ruby -ne '
$_.gsub(/"/,"")
.scan(/ (\p{Hangul}+) ,\[? (.+?) \] /x) {|m| puts m[0] + " - " + m[1]}
' <<END
[[["좋은","good","joh-eun",""]],[["adjective",[["좋은",["good","nice","pretty","admirable","canny","tenacious"],,0.38553435]],"good",4],["adverb",["훌륭하게",["wonderfully","good","nicely","beautifully","fine","finely"],,0.00029145498],"good",4]]]
END
좋은 - good,joh-eun,
좋은 - good,nice,pretty,admirable,canny,tenacious
훌륭하게 - wonderfully,good,nicely,beautifully,fine,finely
太糟糕了,原始文本不容易处理JSON。
感谢this question了解如何匹配韩语字符。