我有一个损坏的文本文件,如果下一行(如果存在)没有以特定模式\x20*[\n\r]+
开头,我需要将\xa0
替换为DATA\t
。如果这样的行以空格\x20+
开头,那么也应该删除它们。
我可以使用sed
吗?文本文件大小约为1MB。
数据示例:
DATA 132942, "I love you", 2398, "Hi how are you"
DATA 78793, "It is
me", 4322, "My name is Frank"
DATA 24121, "Where
are
you", 52432, "I am
here"
DATA 43242, "End of story", 432432, "The end"
=>
DATA 132942, "I love you", 2398, "Hi how are you"
DATA 78793, "It is me", 4322, "My name is Frank"
DATA 24121, "Where are you", 52432, "I am here"
DATA 43242, "End of story", 432432, "The end"
答案 0 :(得分:1)
在Ruby中实现它的方法:
ruby -e 'puts File.read(ARGV.shift).gsub(/ *\r?\n *(?!DATA[[:space:]])/, " ").gsub(/ +$/m, "")' file
输出:
DATA 132942, "I love you", 2398, "Hi how are you"
DATA 78793, "It is me", 4322, "My name is Frank"
DATA 24121, "Where are you", 52432, "I am here"
DATA 43242, "End of story", 432432, "The end"
答案 1 :(得分:1)
cat input.txt | sed '{:q;N;s/\x20*[\n\r]\+/\xa0/g;t q}' | sed 's/\xa0DATA/\nDATA/g'
答案 2 :(得分:1)
这可能适合你(GNU sed):
sed ':a;$!N;/\nDATA/!s/\s*\n\s*/ /;ta;P;D' file