我想逐行读取数据,而且我发现双引号我想用空格替换新行字符直到第二次双引号遭遇 喜欢
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing
Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
与上面的数据第二行一样,因为它在第3行找到双引号(打开)并关闭双引号,所以我们需要将这些行合并为单个空格,如下所示:
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
答案 0 :(得分:8)
您可以使用 gnu-awk one-liner :
awk -v RS='"[^"]*"' -v ORS= '{gsub(/\n/, " ", RT); print $0 RT}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
RS='"[^"]*"'
- 输入记录分隔符设置为正则表达式'"[^"]*"'
-v ORS=
- 输出记录分隔符设置为空gsub(/\n/, " ", RT)
- 在Input Record Separator
这是 perl one-liner :
perl -0pe 's/"[^\n"]*"(*SKIP)(*F)|("[^"\n]*)\n([^"]*")/$1 $2/g' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
答案 1 :(得分:2)
这个单行将做:
perl -F'' -0 -ane ' foreach $char(@F){ $char eq q(") && {$seen= $seen ? 0 : 1}; $seen && $char eq "\n" && { $char=" "}; print $char}'
或:
perl -F'' -0 -ane 'map {$_ eq q(") && {$seen=$seen?0:1}; $seen && $_ eq "\n" &&{$_=" "}; print} @F'
行动中:
$ perl -F'' -0 -ane ' foreach $char(@F){ $char eq q(") && {$seen= $seen ? 0 : 1}; $seen && $char eq "\n" && { $char=" "}; print $char}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
答案 2 :(得分:2)
这适用于您示例中的简单案例:
$ perl -00pe 's/(\n[^"]*"[^"]+)\n(.+?")/$1 $2/gm' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
-00
:粘贴文件,将其视为单个字符串。
* -pe
:在应用-00
给出的脚本后,打印每个输入行(此处为单行“行”,因为-e
)。 (\n[^"]*"[^"]+)\n(.+?")
:匹配换行符(用于表示行的开头),然后匹配尽可能多的非"
([^"]*
),然后匹配"
},然后只有非"
个字符,直到下一个换行符([^"]+\n
),然后是第一个引号之前的所有内容。括号在那里,所以我们可以捕获匹配的字符串。 $1 $2
:这是替换,它将打印前两个捕获的组,因此我们将匹配的模式替换为第一组,一个空格,然后是第二个。
gm
:g
使替换为全局,m
允许多行字符串。
答案 3 :(得分:0)
Perl救援:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1,
eol => "\n",
})
or die "Cannot use CSV: " . 'Text::CSV'->error_diag;
open my $CSV, '<:utf8', shift or die $!;
while (my $row = $csv->getline($CSV)) {
s/\n/ /g for @$row;
$csv->print(*STDOUT, $row);
}
使用
运行时给出预期输出remove-newlines.pl input.csv > output.csv
答案 4 :(得分:0)
使用(我认为)bashism的解决方案(NOT POSIX,不应该&#39; 在除bash之外的其他shell上工作):
function fixmylines {
local line fullline
while read line ; do
if [[ "$line" =~ ^[0-9a-f]{16}, ]] ; then
[ "$fullline" ] && echo "$fullline"
fullline="$line"
else
fullline+=" $line"
fi
done
echo "$fullline"
}
然后您可以将数据传输到此函数(&#34; | fixmylines&#34;)。
注意:它使用regexp&#34; ^ [0-9a-f] {16},&#34;确定一行的开头