CSV输入文件:
"18","Agent","To identify^M
","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
"1078","Repeat","Identify
it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M
"621","Com Dot Com","Identify
","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M
在上面的输入文件中,我有3种不同类型的记录。
1)记录No 18(前2行),即使它应该是一行,它是2行。 ^ M在第一行末尾错误地放置。
预期输出(^ M从第一行移除并使其成为一行)
"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
2)记录No 1078(第3行和第4行) - 这里我没有在第3行末尾的^ M.我想要将第3行和第3行结合起来。 4,并使其成为一行。
预期产出
"1078","Repeat","Identify it has ","0164f3eb-beeb-47dd-b9b99b762f430e14","1"^M
3)记录No 621(第4,5和6行) - 它仅在行尾有^ M,但在它们之间有一个空行。我想删除空白行并使其成为一行。
预期产出
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31 8524fcb97a4dadee","1"^M
答案 0 :(得分:1)
使用Ruby:
ruby -e 'require "csv"; CSV.parse(File.read(ARGV.shift)).each{ |e| e.map!{ |f| f.strip.gsub(/[[:space:]]+/, " ") }; puts CSV.generate_line(e, {:force_quotes => true}); }' csv_file
输出:
"18","Agent","To identify","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"
"1078","Repeat","Identify it has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"
更易读的形式:
ruby -e 'require "csv"
CSV.parse(File.read(ARGV.shift)).each{ |e|
e.map!{ |f|
f.strip.gsub(/[[:space:]]+/, " ")
}
puts CSV.generate_line(e, {:force_quotes => true})
}' csv_file
shopt -u -o histexpand
脚本版本:
#!/usr/bin/env ruby
require 'csv'
CSV.parse(File.read(ARGV.shift)).each{ |e|
e.map!{ |f|
f.strip.gsub(/[[:space:]]+/, " ")
}
puts CSV.generate_line(e, {:force_quotes => true})
}
使用
运行ruby script.rb csv_file
请参阅Ruby-Doc.org了解所有内容。
答案 1 :(得分:0)
这可能有效:
awk -F \",\" '
/^[[:space:]]*$/ { next }
{
line = line $0
if (split(line, a) == 10) {
print line
line = ""
}
}
' file
我觉得仍然会有一些问题(比如缺少空格)。
答案 2 :(得分:0)
将GNU awk用于多字符RS:
$ awk -v RS='^$' -v ORS= 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) gsub(/\n/,"",$i) }1' file
"18","Agent","To identify^M","b5b553d2-81ab-4ec3-83e0-71ae3cf4afab","1"^M
"1078","Repeat","Identifyit has","0164f3eb-beeb-47dd-b9b9-9b762f430e14","1"^M
"621","Com Dot Com","Identify","7fc9e73e-3470-4b31-8524-fcb97a4dadee","1"^M
由于目前尚不清楚你是否真的有控制女士,我现在把它们留作“^ M”字样。如果你有他们只是gsub()他们。