从文本文件中提取带引号的字符串,即使该行换行

时间:2017-06-02 00:17:34

标签: shell unix sed

您好我有一个巨大的文件:

Hi This is a file from and the filename= "file1.txt"
Hello find the filename... filename = "the name of this file is too huge and
goes to the next line but enclosed with double quotes.txt"
There is another file with the filename="file2.txt" size 
is "333kb";

我的预期输出是文件名字符串,没有换行符,在管道分隔的字符串中,如下所示:

file1.txt | the name of this file is too huge and goes to the next line but enclosed with double quotes.txt | file2.txt

我使用了下面的sed命令,但结果并不像预期的那样。它只输出文件名在同一行。

sed -n 's/^.*filename="\(.*\)".*/\1/p

请提前感谢,请帮我解决这个问题。

2 个答案:

答案 0 :(得分:0)

您可以从这个管道开始:

tr '\n' ' ' < input | grep -o 'filename *= *"[^"]*"'

得到:

filename= "file1.txt"
filename = "the name of this file is too huge and goes to the next line but enclosed with double quotes.txt"
filename="file2.txt"

清理:

tr '\n' ' ' < input | grep -o 'filename *= *"[^"]*"' | sed 's/.*"\([^"]*\)"/\1/'

得到:

file1.txt
the name of this file is too huge and goes to the next line but enclosed with double quotes.txt
file2.txt

答案 1 :(得分:0)

用于多字符RS和gensub()的GNU awk:

$ awk -v RS='\\<filename\\s*=\\s*"[^"]+"' -F'"' -v OFS=' | ' '
    RT {$0=gensub(/\s+/," ","g",RT); printf "%s%s", (NR>1?OFS:""), $2}
    END {print ""}
' file
file1.txt | the name of this file is too huge and goes to the next line but enclosed with double quotes.txt | file2.txt