如何将复杂的csv列复制到另一个csv文件

时间:2019-07-02 16:34:34

标签: bash csv awk

我正在尝试将一列从csv文件转换为另一个.csv文件。 但是,单列非常复杂:里面有双引号和逗号 例如:

fileA.csv
A,B,C,D,E,F,G,H
I,J,K,L,M,N,O,P
...

fileB.csv
1,2,3,4,5,"has "commas," and \"quotes\"",7,8
10,11,12,13,14,"another "commas," and \"quotes\"",15,16

我希望将第六列(F&N)替换为fileB.csv中相同的列号

所以结果将是:

A,B,C,D,E,"has "commas," and "\"quotes\""",G,H
I,J,K,L,M,"another "commas," and \"quotes\"",O,P

我尝试使用

paste -d' ' 123.csv  <(awk '{print $6}' realfinalfile.csv) > finalwoot.csv

但是我只得到了123.csv文件的内容,没有输入来自realfinalfile.csv的列

这是实际fileB.csv中的行之一的示例

"R111_Bellca_LiveContent_SHP","bell.ca","BCACXB-6912","No_Request_Validation","20","*No_Request_Validation* issue exists @ *Views/Search/Web.config*



 Request validation is explicitly disabled by version=&quot;1.0&quot;?&gt; in file Views\Search\Web.config at line 1.



 *Application:* R111_Bellca_LiveContent_SHP

 *Cx-Project:* R111_Bellca_LiveContent_SHP

 *Cx-Team:* CxServer\Bell\DCX\Bell.ca

 *Severity:* Medium

 *CWE:* 20



 *Addition Info*

 ----

 [Checkmarx|https://cwypwa-368.bell.corp.bce.ca/CxWebClient/ViewerMain.aspx?scanid=1000353&projectid=136&pathid=184]

 [Mitre Details|https://cwe.mitre.org/data/definitions/20.html]

 [Training|https://cxa.codebashing.com/courses/]

 [Guidance|https://custodela.atlassian.net/wiki/spaces/AS/pages/79462432/Remediation+Guidance]

 Lines: 41 



 ----

 Line #41

 {code}

 validateRequest=""false""

 {code}

 ----

 ","3-Medium","https://cwe.mitre.org/data/definitions/20.html"

所以我想获取看起来像

的单元格的内容
*No_Request_Validation* issue exists @ *Views/Search/Web.config*



 Request validation is explicitly disabled by version...

并将其放入FileA.csv的第六列

1 个答案:

答案 0 :(得分:0)

这是您要做什么的方法:

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
    gsub(/^([^,]*,){5}|(,[^,]*){2}$/,"")
    val[FNR] = $0
    next
}
{
    $6 = val[FNR]
    print
}

$ awk -f tst.awk fileB.csv fileA.csv
A,B,C,D,E,"has "commas," and \"quotes\"",G,H
I,J,K,L,M,"another "commas," and \"quotes\"",O,P

但是,就像您的输入一样,该输出仍然不是有效的CSV。如果您希望输出为有效的CSV,则将其更改为:

$ cat tst.awk
BEGIN { FS=OFS=","; escQ="\\\"" }
NR==FNR {
    gsub(/^([^,]*,){5}|(,[^,]*){2}$/,"")
    gsub(/^"|"$/,"")
    gsub(/\\?"/,escQ)
    val[FNR] = "\"" $0 "\""
    next
}
{
    $6 = val[FNR]
    print
}

$ awk -f tst.awk fileB.csv fileA.csv
A,B,C,D,E,"has \"commas,\" and \"quotes\"",G,H
I,J,K,L,M,"another \"commas,\" and \"quotes\"",O,P

或(只需将escQ="\\\""更改为escQ="\"\""):

$ cat tst.awk
BEGIN { FS=OFS=","; escQ="\"\"" }
NR==FNR {
    gsub(/^([^,]*,){5}|(,[^,]*){2}$/,"")
    gsub(/^"|"$/,"")
    gsub(/\\?"/,escQ)
    val[FNR] = "\"" $0 "\""
    next
}
{
    $6 = val[FNR]
    print
}

$ awk -f tst.awk fileB.csv fileA.csv
A,B,C,D,E,"has ""commas,"" and ""quotes""",G,H
I,J,K,L,M,"another ""commas,"" and ""quotes""",O,P

根据您遵循的CSV“标准”是使用\"还是""来在字段中使用双引号。

注意:仅当您在每条记录中具有已知数量的“字段”,每条记录在一行上且仅“字段”中的一个包含引号和逗号时(如您的示例中所示),以上内容才有效。 / p>