将包含多个单词的单行拆分为多行,每行包含x个单词

时间:2017-01-13 17:52:32

标签: bash sed split xargs

我有一个大文本文件,只包含1行。它看起来像这样:

blaalibababla.ru text text text text what's the weather like tooday? blaazzabla.zu some_text blabewdwefla.au it is important not to be afraid of sed blabkrlqbla.ru wjenfkn lkwnef lkwnefl blarthrthbla.net 1234 e12edq 42wsdfg blablabla.com this should finally end

我需要一种让它看起来像这样的方法:

blaalibababla.ru text text text text what's the weather like tooday?
blaazzabla.zu some_text
blabewdwefla.au it is important not to be afraid of sed
blabkrlqbla.ru wjenfkn lkwnef lkwnefl
blarthrthbla.net 1234 e12edq 42wsdfg 
blablabla.com this should finally end

我知道如何使用单个域名和sed

sed -i 's/blablabla.ru/\n&/g' file.txt

"但之后没有附加文字。" - 这不是我的意思。

如果sed不是最好的方式,请告诉我。

UPD: 这是我的文本文件:

wsd.qwd.qwd.kjqnwk.ru PUPPETD CRITICAL 2017-01-13 00:09:52   lor notify-by-sms FILE_AGE CRITICAL:   /var/lib/puppet/state/state.yaml is 2438046 seconds old and 19459 bytes   zm-goas-04.asdg.net LOAD CRITICAL 2017-01-13 00:10:32   tech-lor notify-by-telegram CRITICAL - load average: 42.91,   49.91, 53.88   glas07.kvm.ext.asdg.ru PUPPETD CRITICAL 2017-01-13 00:28:02   lor notify-by-sms FILE_AGE CRITICAL:   /var/lib/puppet/state/state.yaml is 19821 seconds old and 26337 bytes    

我需要它看起来像:

wsd.qwd.qwd.kjqnwk.ru PUPPETD CRITICAL 2017-01-13 00:09:52   lor notify-by-sms FILE_AGE CRITICAL:   /var/lib/puppet/state/state.yaml is 2438046 seconds old and 19459 bytes   
zm-goas-04.asdg.net LOAD CRITICAL 2017-01-13 00:10:32   tech-lor notify-by-telegram CRITICAL - load average: 42.91,   49.91, 53.88   
glas07.kvm.ext.asdg.ru PUPPETD CRITICAL 2017-01-13 00:28:02   lor notify-by-sms FILE_AGE CRITICAL:   /var/lib/puppet/state/state.yaml is 19821 seconds old and 26337 bytes    

3 个答案:

答案 0 :(得分:5)

使用xargs一次处理n条记录的简单方法,在您的情况下仅为2

xargs -n2 <file
blablabla.ru some_text
blablabla.zu some_text
blablabla.au some_text
blablabla.ru some_text
blablabla.net some_text
blablabla.com some_text

根据-n页面的man xargs标志是

-n max-args, --max-args=max-args
      Use at most max-args arguments per command line.  Fewer than max-args arguments 
      will be used if the size (see the -s option) is exceeded, unless the
      -x option is given, in which case xargs will exit.

要替换原始文件,请执行

xargs -n2 <file >tmpfile; mv tmpfile file

答案 1 :(得分:2)

awk中:

$ awk 'gsub(/([^ ]+ ){2}/,"&\n")' file
blablabla.ru some_text 
blablabla.zu some_text 
blablabla.au some_text 
blablabla.ru some_text 
blablabla.net some_text 
blablabla.com some_text

说明:

每两次重复[^ ]+(非空格和空格字符串)替换为自身(&)和换行符\n。如果最后有剩余(即不匹配),则不会打印(除非您用gsub(...)包裹{}1)。

答案 2 :(得分:0)

尝试拆分此模式:([-a-z0-9]+\.[a-z]+){1,}用于域名。

使用GNU sed:

sed -r 's/ +(([-a-z0-9]+\.[a-z]){1,}) */\n\1/g' file

请注意,任何匹配一个空格后跟[-a-z0-9],后跟.[a-z]字符的字符串都将作为域名处理。