Sed / Awk段落格式解决方案

时间:2015-12-07 19:58:46

标签: text awk sed

我需要从run-together文本中创建段落,在大多数情况下,已删除了回车和/或换行符。对话与文本交织在一起。所以我想要的是在第二次引用之后插入一个空行。看起来引号会引发重建的paragaphs。我添加了正斜杠(不在文本中),因为我不知道在这个网站上引用代码的惯例。这是一个例子:

离开这个:

Bacon ipsum dolor amet熏牛肉夹头鹿肉猪肉,意大利腊肠prosciutto小腿五花肉。菲力牛排牛排火腿飞腓节,培根地圆porchetta alcatra。牛肉培根干酪bresaola短腰里脊牛排“我想要培根。” chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner。肩舌肉丸尾巴生猪肉里脊“我要培根。”牛排小腿夹头小腿侧面猪。短腰猪肉里脊汉堡腌牛肉ribeye三尖doner火腿飞腓landjaeger t骨猪。猪肉猪排法兰克福香肠,丁骨火腿飞腓节熏腊肠。 Biltong牛肉夹头火腿飞腓猪腰肉肩带“我想要培根。”牛排短腰尾巴杯臀部alcatra.Shoulder牛肉cupim臀部地面圆。牛肉牛腩cupim肉丸火腿ribeye。 “我想要培根。”鹿肉尾巴肋眼,熏牛肉舌头猪牛肋骨kielbasa bresaola doner。 Shankle菲力牛排猪,肩球尖猪肚下巴香肠fatback boudin。 Prosciutto鹿肉capicola培根,短腰肉andouille salami小腿舌腌牛肉。 Sirloin biltong boudin里脊牛腩三尖pancetta kielbasa带牛排leberkas短肋侧面菲力牛排火腿飞腓节猪肉。 Tri-tip cupim“我想要培根。” “我想要培根。”

到此:

Bacon ipsum dolor amet熏牛肉夹头鹿肉猪肉,意大利腊肠prosciutto小腿五花肉。菲力牛排牛排火腿飞腓节,培根地圆porchetta alcatra。牛肉培根干酪bresaola短腰里脊牛排

“我想要培根。”

chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner。肩舌肉丸尾巴生猪肉里脊片

“我想要培根。”

牛排小腿夹头柄侧面猪。短腰猪肉里脊汉堡腌牛肉ribeye三尖doner火腿飞腓landjaeger t骨猪。猪肉猪排法兰克福香肠,丁骨火腿飞腓节熏腊肠。 Biltong牛肉夹头火腿飞腓猪腰肉肩带

“我想要培根。”

牛排短的腰部尾巴cupim臀部alcatra。肩牛肉cupim臀部地面圆。牛肉牛腩cupim肉丸火腿ribeye。

“我想要培根。”

鹿肉尾巴肋眼,熏牛肉舌头猪肉肋骨kielbasa bresaola doner。 Shankle菲力牛排猪,肩球尖猪肚下巴香肠fatback boudin。 Prosciutto鹿肉capicola培根,短腰肉andouille salami小腿舌腌牛肉。 Sirloin biltong boudin里脊牛腩三尖pancetta kielbasa带牛排leberkas短肋侧面菲力牛排火腿飞腓节猪肉。 Tri-tip cupim

“我想要培根。”

“我想要培根。”

4 个答案:

答案 0 :(得分:1)

awk -v RS='"' '{
if (NR % 2 == 1) {
    if (/[^[:space:]]/) printf "%s%s\n\n", (NR==1? "" : "\n"), $0
} else {
    printf "\"%s\"\n", $0
}}' file

输出

Bacon ipsum dolor amet pastrami chuck venison swine, salami prosciutto shank pork belly. Filet mignon beef ribs ham hock, bacon ground round porchetta alcatra. Beef bacon biltong bresaola short loin filet mignon 

"I want bacon."

 chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner. Shoulder tongue meatball tail jerky pork loin filet 

"I want bacon."

 mignon shank chuck shankle flank pig. Short loin pork loin hamburger corned beef ribeye tri-tip doner ham hock landjaeger t-bone swine. Swine pork belly frankfurter, t-bone ham hock bacon pastrami. Biltong beef chuck ham hock pork loin shoulder strip 

"I want bacon."

steak short loin tail cupim rump alcatra.Shoulder beef cupim rump ground round. Beef sirloin cupim meatball ham ribeye. 

"I want bacon."

 Venison tail ribeye, pastrami tongue pig beef ribs kielbasa bresaola doner. Shankle filet mignon pig, shoulder ball tip pork belly jowl sausage fatback boudin. Prosciutto venison capicola bacon, short loin andouille salami shank tongue corned beef. Sirloin biltong boudin tenderloin brisket tri-tip pancetta kielbasa strip steak leberkas short ribs flank filet mignon ham hock pork. Tri-tip cupim 

"I want bacon."
"I want bacon."

答案 1 :(得分:0)

试试这个:

awk 'BEGIN{RS="\ ?\"\ ?"; ORS="\n\n"}
     NR%2==0{print "\""$0"\"";next;}
     {}1' inputFile

这将在每个引用之前和之后插入一个新段落("...")。但是,这将使最后几段看起来像这样

"I want bacon."



"I want bacon."

要删除"之间的空白段落,我想要培根":

awk 'BEGIN{RS="\ ?\"\ ?"; ORS="\n\n"}
     NR%2==0{print "\""$0"\"";next;}
     ($0!=""){print $0}' inputFile

答案 2 :(得分:0)

sed可能更容易

$ sed 's/"[^"]*" /\n\n&\n\n/g' bacon

示例:

$ echo "bla bla bla \"This is bacon.\" Starts a new paragraph" | sed 's/"[^"]*" /\n\n&\n\n/g'
bla bla bla

"This is bacon."

Starts a new paragraph

答案 3 :(得分:0)

WithGNU awk for multi-char RS和gensub():

$ awk -v RS='^$' -v ORS= '{$0=gensub(/\s*("[^"]+")\s*/,"\n\n\\1\n\n","g"); gsub(/\n+/,"\n\n")}1' file
Bacon ipsum dolor amet pastrami chuck venison swine, salami prosciutto shank pork belly. Filet mignon beef ribs ham hock, bacon ground round porchetta alcatra. Beef bacon biltong bresaola short loin filet mignon

"I want bacon."

chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner. Shoulder tongue meatball tail jerky pork loin filet

"I want bacon."

mignon shank chuck shankle flank pig. Short loin pork loin hamburger corned beef ribeye tri-tip doner ham hock landjaeger t-bone swine. Swine pork belly frankfurter, t-bone ham hock bacon pastrami. Biltong beef chuck ham hock pork loin shoulder strip

"I want bacon."

steak short loin tail cupim rump alcatra.Shoulder beef cupim rump ground round. Beef sirloin cupim meatball ham ribeye.

"I want bacon."

Venison tail ribeye, pastrami tongue pig beef ribs kielbasa bresaola doner. Shankle filet mignon pig, shoulder ball tip pork belly jowl sausage fatback boudin. Prosciutto venison capicola bacon, short loin andouille salami shank tongue corned beef. Sirloin biltong boudin tenderloin brisket tri-tip pancetta kielbasa strip steak leberkas short ribs flank filet mignon ham hock pork. Tri-tip cupim

"I want bacon."

"I want bacon."