Question

我想逐行读取数据，而且我发现双引号我想用空格替换新行字符直到第二次双引号遭遇喜欢

090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing
Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

与上面的数据第二行一样，因为它在第3行找到双引号（打开）并关闭双引号，所以我们需要将这些行合并为单个空格，如下所示：

090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

Answer 1

您可以使用 gnu-awk one-liner ：

awk -v RS='"[^"]*"' -v ORS= '{gsub(/\n/, " ", RT); print $0  RT}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

RS='"[^"]*"' - 输入记录分隔符设置为正则表达式'"[^"]*"'
-v ORS= - 输出记录分隔符设置为空
gsub(/\n/, " ", RT) - 在Input Record Separator

这是 perl one-liner ：

perl -0pe 's/"[^\n"]*"(*SKIP)(*F)|("[^"\n]*)\n([^"]*")/$1 $2/g' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

Answer 2

这个单行将做：

perl -F'' -0 -ane ' foreach $char(@F){  $char eq q(") && {$seen= $seen ? 0 : 1}; $seen  && $char eq "\n" && { $char=" "}; print $char}'

或：

perl -F'' -0 -ane 'map {$_ eq q(") && {$seen=$seen?0:1}; $seen && $_ eq "\n" &&{$_=" "}; print} @F'

行动中：

$ perl -F'' -0 -ane ' foreach $char(@F){  $char eq q(") && {$seen= $seen ? 0 : 1}; $seen  && $char eq "\n" && { $char=" "}; print $char}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

Answer 3

这适用于您示例中的简单案例：

$ perl -00pe 's/(\n[^"]*"[^"]+)\n(.+?")/$1 $2/gm' file 
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To   local testing Rohit  3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology

注意事项

这会将整个文件加载到内存中，这可能会有问题，具体取决于文件的大小。
它不处理跨越多行的开放引号。

解释

-00：粘贴文件，将其视为单个字符串。 * -pe：在应用-00给出的脚本后，打印每个输入行（此处为单行“行”，因为-e）。
(\n[^"]*"[^"]+)\n(.+?")：匹配换行符（用于表示行的开头），然后匹配尽可能多的非"（[^"]*），然后匹配" }，然后只有非"个字符，直到下一个换行符（[^"]+\n），然后是第一个引号之前的所有内容。括号在那里，所以我们可以捕获匹配的字符串。
$1 $2：这是替换，它将打印前两个捕获的组，因此我们将匹配的模式替换为第一组，一个空格，然后是第二个。
gm：g使替换为全局，m允许多行字符串。

Answer 4

Perl救援：

#!/usr/bin/perl
use warnings;
use strict;

use Text::CSV;
my $csv = 'Text::CSV'->new({ binary => 1,
                             eol => "\n",
                           })
    or die "Cannot use CSV: " . 'Text::CSV'->error_diag;

open my $CSV, '<:utf8', shift or die $!;
while (my $row = $csv->getline($CSV)) {
    s/\n/ /g for @$row;
    $csv->print(*STDOUT, $row);
}

使用

运行时给出预期输出

remove-newlines.pl input.csv > output.csv

Answer 5

使用（我认为）bashism的解决方案（NOT POSIX，不应该＆＃39; 在除bash之外的其他shell上工作）：

function fixmylines { 
  local line fullline
  while read line ; do 
    if [[ "$line" =~ ^[0-9a-f]{16}, ]] ; then
      [ "$fullline" ] && echo "$fullline"
      fullline="$line"
    else
      fullline+=" $line"
    fi
  done
  echo "$fullline"
}

然后您可以将数据传输到此函数（＆＃34; | fixmylines＆＃34;）。

注意：它使用regexp＆＃34; ^ [0-9a-f] {16}，＆＃34;确定一行的开头

用空格替换双引号之间的换行符

5 个答案:

注意事项

解释