Question

我正在尝试将一列从csv文件转换为另一个.csv文件。但是，单列非常复杂：里面有双引号和逗号例如：

fileA.csv
A,B,C,D,E,F,G,H
I,J,K,L,M,N,O,P
...

和

fileB.csv
1,2,3,4,5,"has "commas," and \"quotes\"",7,8
10,11,12,13,14,"another "commas," and \"quotes\"",15,16

我希望将第六列（F＆N）替换为fileB.csv中相同的列号

所以结果将是：

A,B,C,D,E,"has "commas," and "\"quotes\""",G,H
I,J,K,L,M,"another "commas," and \"quotes\"",O,P

我尝试使用

paste -d' ' 123.csv  <(awk '{print $6}' realfinalfile.csv) > finalwoot.csv

但是我只得到了123.csv文件的内容，没有输入来自realfinalfile.csv的列

这是实际fileB.csv中的行之一的示例

"R111_Bellca_LiveContent_SHP","bell.ca","BCACXB-6912","No_Request_Validation","20","*No_Request_Validation* issue exists @ *Views/Search/Web.config*



 Request validation is explicitly disabled by version=&quot;1.0&quot;?&gt; in file Views\Search\Web.config at line 1.



 *Application:* R111_Bellca_LiveContent_SHP

 *Cx-Project:* R111_Bellca_LiveContent_SHP

 *Cx-Team:* CxServer\Bell\DCX\Bell.ca

 *Severity:* Medium

 *CWE:* 20



 *Addition Info*

 ----

 [Checkmarx|https://cwypwa-368.bell.corp.bce.ca/CxWebClient/ViewerMain.aspx?scanid=1000353&projectid=136&pathid=184]

 [Mitre Details|https://cwe.mitre.org/data/definitions/20.html]

 [Training|https://cxa.codebashing.com/courses/]

 [Guidance|https://custodela.atlassian.net/wiki/spaces/AS/pages/79462432/Remediation+Guidance]

 Lines: 41 



 ----

 Line #41

 {code}

 validateRequest=""false""

 {code}

 ----

 ","3-Medium","https://cwe.mitre.org/data/definitions/20.html"

所以我想获取看起来像

的单元格的内容

*No_Request_Validation* issue exists @ *Views/Search/Web.config*



 Request validation is explicitly disabled by version...

并将其放入FileA.csv的第六列

Answer 1

这是您要做什么的方法：

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
    gsub(/^([^,]*,){5}|(,[^,]*){2}$/,"")
    val[FNR] = $0
    next
}
{
    $6 = val[FNR]
    print
}

$ awk -f tst.awk fileB.csv fileA.csv
A,B,C,D,E,"has "commas," and \"quotes\"",G,H
I,J,K,L,M,"another "commas," and \"quotes\"",O,P

但是，就像您的输入一样，该输出仍然不是有效的CSV。如果您希望输出为有效的CSV，则将其更改为：

$ cat tst.awk
BEGIN { FS=OFS=","; escQ="\\\"" }
NR==FNR {
    gsub(/^([^,]*,){5}|(,[^,]*){2}$/,"")
    gsub(/^"|"$/,"")
    gsub(/\\?"/,escQ)
    val[FNR] = "\"" $0 "\""
    next
}
{
    $6 = val[FNR]
    print
}

$ awk -f tst.awk fileB.csv fileA.csv
A,B,C,D,E,"has \"commas,\" and \"quotes\"",G,H
I,J,K,L,M,"another \"commas,\" and \"quotes\"",O,P

或（只需将escQ="\\\""更改为escQ="\"\""）：

$ cat tst.awk
BEGIN { FS=OFS=","; escQ="\"\"" }
NR==FNR {
    gsub(/^([^,]*,){5}|(,[^,]*){2}$/,"")
    gsub(/^"|"$/,"")
    gsub(/\\?"/,escQ)
    val[FNR] = "\"" $0 "\""
    next
}
{
    $6 = val[FNR]
    print
}

$ awk -f tst.awk fileB.csv fileA.csv
A,B,C,D,E,"has ""commas,"" and ""quotes""",G,H
I,J,K,L,M,"another ""commas,"" and ""quotes""",O,P

根据您遵循的CSV“标准”是使用\"还是""来在字段中使用双引号。

注意：仅当您在每条记录中具有已知数量的“字段”，每条记录在一行上且仅“字段”中的一个包含引号和逗号时（如您的示例中所示），以上内容才有效。 / p>

如何将复杂的csv列复制到另一个csv文件

1 个答案: