以下是数据示例:
2013-06-22 00:00:49.307121|147374 |PHONE HOME|SDRKRKS|REAS|something|KRISTCOS 11:13 AM 6/22/2013
NUM: 90834098
data: 0394884
cX: 90h010f03040f
mR: 034050t0ds0
cNUM: 034050t0ds0
2013-06-22 00:00:49.307121|0950704421406 |PHONE HOME|SDRKRKS|REAS|something|MRS
2013-06-22 00:00:50.379487|0441813679603 |PHONE HOME|SDRKRKS|REAS|something|TN 90210
我需要一个脚本来从不以时间戳开头的行中删除新行字符。在上面的示例中,第2-5行将附加到某种文本blob中第一行的最后一个字段。我知道如何检测好线,
grep '^[0-9][0-9][0-9][0-9].*' testfile
还有坏线,
grep '^[^0-9][^0-9][^0-9][^0-9].*' testfile
现在的问题是,我如何应用它(使用sed?)以便将'good'行后面的行放回到该行的最后一个字段中。这里的任何帮助将不胜感激。
以下是所需输出的示例:
2013-06-22 00:00:49.307121|147374 |PHONE HOME|SDRKRKS|REAS|something|KRISTCOS 11:13 AM 6/22/2013 NUM: 90834098 data: 0394884 cX: 90h010f03040f mR: 034050t0ds0 cNUM: 034050t0ds0
2013-06-22 00:00:49.307121|0950704421406 |PHONE HOME|SDRKRKS|REAS|something|MRS
2013-06-22 00:00:50.379487|0441813679603 |PHONE HOME|SDRKRKS|REAS|something|TN 90210
编辑:
对于哪种是最合适的工具存在一些分歧。目前我倾向于记事本++。这接近我想要做的事情,但它不是很有效,也许有人可以帮助我调整它到我的用例:
(?! [0-9]{4}\-[0-9]{2}-[0-9]{2}).*
(?! [0-9]{4}\-[0-9]{2}-[0-9]{2}) - searches for a line not like a timestamp
.* - followed by anything else
问题是。*捕获了我试图否定的时间戳。有什么想法吗?
编辑2: 感谢大家提供的有用建议,这无疑让我朝着正确的方向前进!以下正则表达式在notepad ++中找到了有问题的\ n char,但是当我尝试执行替换时没有任何反应:
Find: (.*)(\n)(?![0-9]{4}\-[0-9]{2}\-[0-9]{2})
Replace: \1
有没有人在这里有任何想法如何强制记事本++删除有问题的\ n?
编辑3: 以下是与建议的解决方案似乎不兼容的其他示例数据:
2013-06-22 00:00:02.540298|0238704723874 |SMELL TEST|HAKEKJ |REAS|No cooking|tcna / ncc
2013-06-22 00:00:04.302887|3289749873342 |SMELL TEST|ICNIDF |REAS|No cooking|JINUJ/CVGIND/NASR
6:13 AM 6/22/2013
VERIFIED CURLING
TN :- 834974978398
XX and YY updated
THIS IS A SENTENCE
2013-06-22 00:00:06.937545|30874987392838 |SMELL TEST|KCIDKD |REAS|No cooking|SrutiD/cvgind/nasr
tn 4887839847
答案 0 :(得分:2)
使用在一个文件中连接的所有已发布的示例输入:
$ cat file
2013-06-22 00:00:49.307121|147374 |PHONE HOME|SDRKRKS|REAS|something|KRISTCOS 11:13 AM 6/22/2013
NUM: 90834098
data: 0394884
cX: 90h010f03040f
mR: 034050t0ds0
cNUM: 034050t0ds0
2013-06-22 00:00:49.307121|0950704421406 |PHONE HOME|SDRKRKS|REAS|something|MRS
2013-06-22 00:00:50.379487|0441813679603 |PHONE HOME|SDRKRKS|REAS|something|TN 90210
2013-06-22 00:00:02.540298|0238704723874 |SMELL TEST|HAKEKJ |REAS|No cooking|tcna / ncc
2013-06-22 00:00:04.302887|3289749873342 |SMELL TEST|ICNIDF |REAS|No cooking|JINUJ/CVGIND/NASR
6:13 AM 6/22/2013
VERIFIED CURLING
TN :- 834974978398
XX and YY updated
THIS IS A SENTENCE
2013-06-22 00:00:06.937545|30874987392838 |SMELL TEST|KCIDKD |REAS|No cooking|SrutiD/cvgind/nasr
tn 4887839847
$ awk 'NR>1{pre = (/^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}/ ? ORS : OFS)} {printf "%s%s",pre,$0} END{print ""}' file
2013-06-22 00:00:49.307121|147374 |PHONE HOME|SDRKRKS|REAS|something|KRISTCOS 11:13 AM 6/22/2013 NUM: 90834098 data: 0394884 cX: 90h010f03040f mR: 034050t0ds0 cNUM: 034050t0ds0
2013-06-22 00:00:49.307121|0950704421406 |PHONE HOME|SDRKRKS|REAS|something|MRS
2013-06-22 00:00:50.379487|0441813679603 |PHONE HOME|SDRKRKS|REAS|something|TN 90210
2013-06-22 00:00:02.540298|0238704723874 |SMELL TEST|HAKEKJ |REAS|No cooking|tcna / ncc
2013-06-22 00:00:04.302887|3289749873342 |SMELL TEST|ICNIDF |REAS|No cooking|JINUJ/CVGIND/NASR 6:13 AM 6/22/2013 VERIFIED CURLING TN :- 834974978398 XX and YY updated THIS IS A SENTENCE
2013-06-22 00:00:06.937545|30874987392838 |SMELL TEST|KCIDKD |REAS|No cooking|SrutiD/cvgind/nasr tn 4887839847
如果这不是您的预期输出,请更新您的问题以显示它是什么。
答案 1 :(得分:2)
最简单的解决方案:
echo $(cat file) | sed -re 's/(2013-06)/@@@\1/g' | sed -re 's/@@@/\n/g'
这是因为没有引号的echo将所有内容放在同一行,然后我们在时间戳之前插入@@@并用新行字符替换@@@。
tiago@dell:~$ echo $(cat file) | sed -re 's/(2013-06)/@@@\1/g' | sed -re 's/@@@/\n/g' 2013-06-22 00:00:49.307121|147374 |PHONE HOME|SDRKRKS|REAS|something|KRISTCOS 11:13 AM 6/22/2013 NUM: 90834098 data: 0394884 cX: 90h010f03040f mR: 034050t0ds0 cNUM: 034050t0ds0 2013-06-22 00:00:49.307121|0950704421406 |PHONE HOME|SDRKRKS|REAS|something|MRS 2013-06-22 00:00:50.379487|0441813679603 |PHONE HOME|SDRKRKS|REAS|something|TN 90210 2013-06-22 00:00:02.540298|0238704723874 |SMELL TEST|HAKEKJ |REAS|No cooking|tcna / ncc 2013-06-22 00:00:04.302887|3289749873342 |SMELL TEST|ICNIDF |REAS|No cooking|JINUJ/CVGIND/NASR 6:13 AM 6/22/2013 VERIFIED CURLING TN :- 834974978398 XX and YY updated THIS IS A SENTENCE 2013-06-22 00:00:06.937545|30874987392838 |SMELL TEST|KCIDKD |REAS|No cooking|SrutiD/cvgind/nasr tn 4887839847
tiago@dell:~$ cat file 2013-06-22 00:00:49.307121|147374 |PHONE HOME|SDRKRKS|REAS|something|KRISTCOS 11:13 AM 6/22/2013 NUM: 90834098 data: 0394884 cX: 90h010f03040f mR: 034050t0ds0 cNUM: 034050t0ds0 2013-06-22 00:00:49.307121|0950704421406 |PHONE HOME|SDRKRKS|REAS|something|MRS 2013-06-22 00:00:50.379487|0441813679603 |PHONE HOME|SDRKRKS|REAS|something|TN 90210 2013-06-22 00:00:02.540298|0238704723874 |SMELL TEST|HAKEKJ |REAS|No cooking|tcna / ncc 2013-06-22 00:00:04.302887|3289749873342 |SMELL TEST|ICNIDF |REAS|No cooking|JINUJ/CVGIND/NASR 6:13 AM 6/22/2013 VERIFIED CURLING TN :- 834974978398 XX and YY updated THIS IS A SENTENCE 2013-06-22 00:00:06.937545|30874987392838 |SMELL TEST|KCIDKD |REAS|No cooking|SrutiD/cvgind/nasr tn 4887839847
答案 2 :(得分:1)
我不确定你喜欢做什么,因为你没有提供输出示例
但是,如果您想连接线路,可以试试这个awk
awk '{printf (!/2013/?" ":RS)"%s",$0} END {print ""}'
2013-06-22 00:00:49.307121|147374 |PHONE HOME|SDRKRKS|REAS|something|KRISTCOS 11:13 AM 6/22/2013 NUM: 90834098 data: 0394884 cX: 90h010f03040f mR: 034050t0ds0 cNUM: 034050t0ds0
2013-06-22 00:00:49.307121|0950704421406 |PHONE HOME|SDRKRKS|REAS|something|MRS
2013-06-22 00:00:50.379487|0441813679603 |PHONE HOME|SDRKRKS|REAS|something|TN 90210
答案 3 :(得分:1)
以下是使用GNU sed
的一种方式:
sed -nr ':a;N;/\n[0-9]{4}-[0-9]{2}-[0-9]{2}/{P;$!D;s/.*\n//p};s/\n/ /g;$!ba;p' file
:a
N
/\n[0-9]{4}-[0-9]{2}-[0-9]{2}/{P;$!D;s/.*\n//p}
测试附加的行是否以日期开头,如果是,则打印到第一个换行符,如果不是最后一行,则删除第一个新行。如果是最后一行,则删除换行符并打印出来。 s/\n/ /g;
所有其他行继续删除新行。 ba
分支回到我们的标签并重复答案 4 :(得分:1)
这可能适合你(GNU sed):
sed ':a;$!N;/^[^|]*$/Ms/\n/ /;ta' file
如果附加的最后一行不包含|
,则用空格替换换行符并重复。