如何合并模式中的多行字符串(但不包括模式)?

时间:2018-12-11 09:52:30

标签: bash awk sed pattern-matching

我有一个逗号分隔的CSV文件(来自数据库),但不幸的是最后一个字段是用双引号引起来的多行字符串,如下所示:

138749,CJIKMN,"d4IFtjCCBbIGCSqGSIb3DQEHAqCCBaMwggWfAgEDMQ0wCwYJYIZIAWUDBAIBMG4GBmeBCAEBAaBk
BGIwYAIBADALBglghkgBZQMEAgEwTjAlAgEBBCAeyMDmgdZS30d5JSraWWoUX50J1vKONjxUYxK9
iPZWWjAlAgECBCCzqs7CzH7+3j3trSz+/dcCmud3/Jo9ZYFmN4VTvTjB56CCBBowggQWMIIDnaAD
Lp69+Z3QgAIgHOYjzAQlDRHnDJ/zDtlkWN5pq7T7h3ef9Mnv4ocSuAA="
136065,CIJEPY,"d4IF4jCCBd4GCSqGSIb3DQEHAqCCBc8wggXLAgEDMQ0wCwYJYIZIAWUDBAIBMIGYBgZngQgBAQGg
gY0EgYowgYcCAQAwCwYJYIZIAWUDBAIBMHUwJQIBAQQgNQdsXvKebYUdH0JybzpY2evf+v9Xg86b
hkjOPQQDAjBBMQswCQYDVQQGEwJHQjEOMAwGA1UEChMFVUtLUEExIjAgBgNVBAMTGUNvdW50cnkg
LUxRjUXbTgfGwUKOFwemsc4KXbsLZ13MkbNfAQ=="

如何合并那些多行(不包括引号),其余部分保持原样?我所能想到的: sed '/\"/{n;:l N;/\"/b; s/\n//; bl}' sampleOut.txt,但这不是我想要的。我正在寻找这个:

138749,CJIKMN,d4IFtjCCBbIGCSqGSIb3DQEHAqCCBaMwggWfAgEDMQ0wCwYJYIZIAWUDBAIBMG4GBmeBCAEBAaBkBGIwYAIBADALBglghkgBZQMEAgEwTjAlAgEBBCAeyMDmgdZS30d5JSraWWoUX50J1vKONjxUYxK9iPZWWjAlAgECBCCzqs7CzH7+3j3trSz+/dcCmud3/Jo9ZYFmN4VTvTjB56CCBBowggQWMIIDnaADLp69+Z3QgAIgHOYjzAQlDRHnDJ/zDtlkWN5pq7T7h3ef9Mnv4ocSuAA=

任何想法我该怎么做?我也同意awk

-圣

3 个答案:

答案 0 :(得分:2)

请您尝试以下操作(稍后会添加说明)。

awk '
/,\"/{
  val=$0
  gsub(/\"/,"",val)
  next
}
/\"$/{
  gsub(/\"/,"")
  print val $0
  val=""
  next
}
{
  gsub(/\"/,"")
  val=val?val $0:$0
}
END{
  if(val){
    print val
  }
}'  Input_file

说明: 现在添加上述代码的说明。

awk '
/,\"/{                 ##Checking condition if a line has comma with " in it then do following.
  val=$0               ##Assigning current line value to variable val here.
  gsub(/\"/,"",val)
  next                 ##Using next will skip all further statements.
}
/\"$/{                 ##Checking condition if a line is ending with " then do following.
  gsub(/\"/,"")
  print val $0         ##Printing variable val and current line value here.
  val=""               ##Nullifying variable val value here.
  next                 ##Using next will skip all further statements.
}                      ##Closing block for condition here.
{
  gsub(/\"/,"")
  val=val?val $0:$0    ##Creating a variable named val whose value is current line value and its concatenating its own value in it.
}
END{                   ##END section of awk is getting started here.
  if(val){             ##Checking condition if variable val is NOT NULL, if yes then do following.
    print val          ##Printing variable val value here.
  }                    ##Closing block of if condition here.
}' Input_file          ##mentioning Input_file name here.

答案 1 :(得分:1)

sed用于在单独的行上执行s / old / new。使用用于多字符RS和RT的GNU awk:

$ awk -v RS='"[^"]+"' -v ORS= '{gsub(/[\n"]/,"",RT); print $0 RT}' file
138749,CJIKMN,d4IFtjCCBbIGCSqGSIb3DQEHAqCCBaMwggWfAgEDMQ0wCwYJYIZIAWUDBAIBMG4GBmeBCAEBAaBkBGIwYAIBADALBglghkgBZQMEAgEwTjAlAgEBBCAeyMDmgdZS30d5JSraWWoUX50J1vKONjxUYxK9iPZWWjAlAgECBCCzqs7CzH7+3j3trSz+/dcCmud3/Jo9ZYFmN4VTvTjB56CCBBowggQWMIIDnaADLp69+Z3QgAIgHOYjzAQlDRHnDJ/zDtlkWN5pq7T7h3ef9Mnv4ocSuAA=
136065,CIJEPY,d4IF4jCCBd4GCSqGSIb3DQEHAqCCBc8wggXLAgEDMQ0wCwYJYIZIAWUDBAIBMIGYBgZngQgBAQGggY0EgYowgYcCAQAwCwYJYIZIAWUDBAIBMHUwJQIBAQQgNQdsXvKebYUdH0JybzpY2evf+v9Xg86bhkjOPQQDAjBBMQswCQYDVQQGEwJHQjEOMAwGA1UEChMFVUtLUEExIjAgBgNVBAMTGUNvdW50cnkgLUxRjUXbTgfGwUKOFwemsc4KXbsLZ13MkbNfAQ==

答案 2 :(得分:0)

尝试以下Perl解决方案:

$ cat mac.txt
138749,CJIKMN,"d4IFtjCCBbIGCSqGSIb3DQEHAqCCBaMwggWfAgEDMQ0wCwYJYIZIAWUDBAIBMG4GBmeBCAEBAaBk
BGIwYAIBADALBglghkgBZQMEAgEwTjAlAgEBBCAeyMDmgdZS30d5JSraWWoUX50J1vKONjxUYxK9
iPZWWjAlAgECBCCzqs7CzH7+3j3trSz+/dcCmud3/Jo9ZYFmN4VTvTjB56CCBBowggQWMIIDnaAD
Lp69+Z3QgAIgHOYjzAQlDRHnDJ/zDtlkWN5pq7T7h3ef9Mnv4ocSuAA="
136065,CIJEPY,"d4IF4jCCBd4GCSqGSIb3DQEHAqCCBc8wggXLAgEDMQ0wCwYJYIZIAWUDBAIBMIGYBgZngQgBAQGg
gY0EgYowgYcCAQAwCwYJYIZIAWUDBAIBMHUwJQIBAQQgNQdsXvKebYUdH0JybzpY2evf+v9Xg86b
hkjOPQQDAjBBMQswCQYDVQQGEwJHQjEOMAwGA1UEChMFVUtLUEExIjAgBgNVBAMTGUNvdW50cnkg
LUxRjUXbTgfGwUKOFwemsc4KXbsLZ13MkbNfAQ=="
$ perl -ne ' chomp; if( /"$/) { s/\"//g;print $_,"\n" } else { s/\"//g; print } ' mac.txt  | nl
     1  138749,CJIKMN,d4IFtjCCBbIGCSqGSIb3DQEHAqCCBaMwggWfAgEDMQ0wCwYJYIZIAWUDBAIBMG4GBmeBCAEBAaBkBGIwYAIBADALBglghkgBZQMEAgEwTjAlAgEBBCAeyMDmgdZS30d5JSraWWoUX50J1vKONjxUYxK9iPZWWjAlAgECBCCzqs7CzH7+3j3trSz+/dcCmud3/Jo9ZYFmN4VTvTjB56CCBBowggQWMIIDnaADLp69+Z3QgAIgHOYjzAQlDRHnDJ/zDtlkWN5pq7T7h3ef9Mnv4ocSuAA=
     2  136065,CIJEPY,d4IF4jCCBd4GCSqGSIb3DQEHAqCCBc8wggXLAgEDMQ0wCwYJYIZIAWUDBAIBMIGYBgZngQgBAQGggY0EgYowgYcCAQAwCwYJYIZIAWUDBAIBMHUwJQIBAQQgNQdsXvKebYUdH0JybzpY2evf+v9Xg86bhkjOPQQDAjBBMQswCQYDVQQGEwJHQjEOMAwGA1UEChMFVUtLUEExIjAgBgNVBAMTGUNvdW50cnkgLUxRjUXbTgfGwUKOFwemsc4KXbsLZ13MkbNfAQ==
$