在UNIX中删除CSV文件中字段之间的空格

时间:2014-07-11 03:11:09

标签: linux bash csv awk sed

CSV输入文件:

"18","Agent","To identify^M
","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
"1078","Repeat","Identify
it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M
"621","Com Dot Com","Identify

","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M

在上面的输入文件中,我有3种不同类型的记录。

1)记录No 18(前2行),即使它应该是一行,它是2行。 ^ M在第一行末尾错误地放置。

预期输出(^ M从第一行移除并使其成为一行)

"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M

2)记录No 1078(第3行和第4行) - 这里我没有在第3行末尾的^ M.我想要将第3行和第3行结合起来。 4,并使其成为一行。

预期产出

"1078","Repeat","Identify it has ","0164f3eb-beeb-47dd-b9b99b762f430e14","1"^M

3)记录No 621(第4,5和6行) - 它仅在行尾有^ M,但在它们之间有一个空行。我想删除空白行并使其成为一行。

预期产出

"621","Com Dot Com","Identify","7fc9e73e-3470-4b31 8524fcb97a4dadee","1"^M

3 个答案:

答案 0 :(得分:1)

使用Ruby:

ruby -e 'require "csv"; CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") }; puts CSV.generate_line(e, {:force_quotes => true}); }' csv_file

输出:

"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"
"1078","Repeat","Identify it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"

更易读的形式:

ruby -e 'require "csv"
    CSV.parse(File.read(ARGV.shift)).each{ |e|
        e.map!{ |f|
            f.strip.gsub(/[[:space:]]+/, " ")
        }
        puts CSV.generate_line(e, {:force_quotes => true})
    }' csv_file
  • Bash的历史记录扩展可能会影响命令,因此您可以根据需要禁用它:shopt -u -o histexpand

脚本版本:

#!/usr/bin/env ruby
require 'csv'
CSV.parse(File.read(ARGV.shift)).each{ |e|
  e.map!{ |f|
    f.strip.gsub(/[[:space:]]+/, " ")
  }
  puts CSV.generate_line(e, {:force_quotes => true})
}

使用

运行
ruby script.rb csv_file

请参阅Ruby-Doc.org了解所有内容。

答案 1 :(得分:0)

这可能有效:

awk -F \",\" '
  /^[[:space:]]*$/ { next }
  {
    line = line $0
    if (split(line, a) == 10) {
      print line
      line = ""
    }
  }
' file

我觉得仍然会有一些问题(比如缺少空格)。

答案 2 :(得分:0)

将GNU awk用于多字符RS:

$ awk -v RS='^$' -v ORS= 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/\n/,"",$i) }1' file
"18","Agent","To identify^M","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
"1078","Repeat","Identifyit has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M

由于目前尚不清楚你是否真的有控制女士,我现在把它们留作“^ M”字样。如果你有他们只是gsub()他们。