Awk将三个文件的结果合并为一个文件,并以任意字符定界

时间:2018-12-01 18:23:27

标签: awk

我使用awk替换每个字段之间的空格分隔符,因此它成为一个字段。我想将处理任意数量的文件后的输出合并到结果文件中,并以空格分隔。

awk -v OFS='^' '{for(i=1; i<=NF; i++)printf("%s%s", $i,(i==NF)?ORS:OFS)}' filename > outputFile

File1在Awk命令之后

777^Brockton^Avenue,^Abington^MA^2351         
30^Memorial^Drive,Avon^MA^2322               
250^Hartford^Avenue,^Bellingham^MA^2019.    
....

将awk应用于文件2时,该命令不受该命令的影响,因为它只有一个字段。

madanm@comcast.net
skajan@verizon.net
barnett@hotmail.com
sbmrjbr@sbcglobal.net
mastinfo@sbcglobal.net
....

我尝试在应用awk命令后合并三个文件

paste listOf* |  awk -v OFS='^' '{for(i=1; i<=NF; i++)printf("%s%s", $i,(i==NF)?ORS:OFS)}' > outputFile

但是我的结果看起来像这样

777^Brockton^Avenue,^Abington^MA^2351^madanm@comcast.net^Manual^Ordway
30^Memorial^Drive,^Avon^MA^2322^skajan@verizon.net^Yuonne^Cajigas
250^Hartford^Avenue,^Bellingham^MA^2019^barnett@hotmail.com^Pattie^Darsey
700^Oak^Street,^Brockton^MA^2301^sbmrjbr@sbcglobal.net^Cammie^Knoles
66-4^Parkhurst^Rd,^Chelmsford^MA^1824^mastinfo@sbcglobal.net^Evia^Fallen
591^Memorial^Dr,^Chicopee^MA^1020^carcus@aol.com^Soo^Sanfilippo

我希望它看起来像这样

Home Address[delimiter]Email[delimiter]Name[delimiter]

777^Brockton^Avenue,^Abington^MA^2351 madanm@comcast.net Manual^Ordway
30^Memorial^Drive,^Avon^MA^2322 skajan@verizon.net Yuonne^Cajigas
250^Hartford^Avenue,^Bellingham^MA^2019 barnett@hotmail.com Pattie^Darsey
700^Oak^Street,^Brockton^MA^2301 sbmrjbr@sbcglobal.net Cammie^Knoles
66-4^Parkhurst^Rd,^Chelmsford^MA^1824 mastinfo@sbcglobal.net Evia^Fallen
591^Memorial^Dr,^Chicopee^MA^1020 carcus@aol.com Soo^Sanfilippo

2 个答案:

答案 0 :(得分:0)

分隔符并不总是空格。因此,基本上您的第一个命令不会删除定界符,而是将其更改为^

此命令也可以正常工作:

awk '{$1=$1}1' OFS='^' file > newfile

Sed对此更好:

sed 's/ /^/g' file > newfile

对于第二个命令,默认情况下粘贴使用制表符分隔符,但是您可以更改它。 您需要空格,请将空格用作-d选项。 :

paste -d" " file* > newfile

请记住,您可以选择要读取或写入csv的任何定界符(其名称并不表示此含义)。如果您的输入文件使用空格分隔符,则可以对粘贴命令使用逗号分隔符。

答案 1 :(得分:0)

不执行您要执行的操作,实际上是在弄乱文件。特别是^是引入字符的一个糟糕选择,因为它是一个正则表达式元字符,因此将使得进一步的处理变得非常困难。为什么不只将空白字符留在输入中(如果可以使用制表符,将其中的任何制表符转换为空格),然后使用制表符作为分隔符或将整个内容转换为CSV?

例如,给出以下输入:

$ cat file1
777 Brockton Avenue, Abington MA 2351
30 Memorial Drive, Avon MA 2322
250 Hartford Avenue, Bellingham MA 2019
700 Oak Street, Brockton MA 2301
66-4 Parkhurst Rd, Chelmsford MA 1824
591 Memorial Dr, Chicopee MA 1020

$ cat file2
madanm@comcast.net
skajan@verizon.net
barnett@hotmail.com
sbmrjbr@sbcglobal.net
mastinfo@sbcglobal.net
carcus@aol.com

$ cat file3
Manual Ordway
Yuonne Cajigas
Pattie Darsey
Cammie Knoles
Evia Fallen
Soo Sanfilippo

您可以生产TSV:

$ cat tst.awk
BEGIN {
    OFS = "\t"
    ofmt = "%s%s"
    numFiles = ARGC - 1
}
FNR == 1 {
    fileNr++
}
{
    gsub(/[[:space:]]+/," ")
    gsub(/^ | $/,"")
    val[FNR,ARGIND] = $0
}
fileNr == numFiles {
    for (i=1; i<=numFiles; i++) {
        printf ofmt, val[FNR,i], (i<numFiles ? OFS : ORS)
    }
}

$ awk -f tst.awk file1 file2 file3
777 Brockton Avenue, Abington MA 2351   madanm@comcast.net      Manual Ordway
30 Memorial Drive, Avon MA 2322 skajan@verizon.net      Yuonne Cajigas
250 Hartford Avenue, Bellingham MA 2019 barnett@hotmail.com     Pattie Darsey
700 Oak Street, Brockton MA 2301        sbmrjbr@sbcglobal.net   Cammie Knoles
66-4 Parkhurst Rd, Chelmsford MA 1824   mastinfo@sbcglobal.net  Evia Fallen
591 Memorial Dr, Chicopee MA 1020       carcus@aol.com  Soo Sanfilippo

或CSV(仅更改OFSofmt的值):

$ cat tst.awk
BEGIN {
    OFS = ","
    ofmt = "\"%s\"%s"
    numFiles = ARGC - 1
}
FNR == 1 {
    fileNr++
}
{
    gsub(/[[:space:]]+/," ")
    gsub(/^ | $/,"")
    val[FNR,ARGIND] = $0
}
fileNr == numFiles {
    for (i=1; i<=numFiles; i++) {
        printf ofmt, val[FNR,i], (i<numFiles ? OFS : ORS)
    }
}

$ awk -f tst.awk file1 file2 file3
"777 Brockton Avenue, Abington MA 2351","madanm@comcast.net","Manual Ordway"
"30 Memorial Drive, Avon MA 2322","skajan@verizon.net","Yuonne Cajigas"
"250 Hartford Avenue, Bellingham MA 2019","barnett@hotmail.com","Pattie Darsey"
"700 Oak Street, Brockton MA 2301","sbmrjbr@sbcglobal.net","Cammie Knoles"
"66-4 Parkhurst Rd, Chelmsford MA 1824","mastinfo@sbcglobal.net","Evia Fallen"
"591 Memorial Dr, Chicopee MA 1020","carcus@aol.com","Soo Sanfilippo"

或任何其他常见文件格式。例如,MS-Excel可以理解以上两种情况。

仅显示所需的最小更改,即可获得您实际需要的内容(再次不要这样做!)。

$ cat tst.awk
BEGIN {
    OFS  = " "
    ofmt = "%s%s"
    numFiles = ARGC - 1
}
FNR == 1 {
    fileNr++
}
{
    gsub(/[[:space:]^]+/,"^")
    gsub(/^\^|\^$/,"")
    val[FNR,ARGIND] = $0
}
fileNr == numFiles {
    for (i=1; i<=numFiles; i++) {
        printf ofmt, val[FNR,i], (i<numFiles ? OFS : ORS)
    }
}

$ awk -f tst.awk file1 file2 file3
777^Brockton^Avenue,^Abington^MA^2351 madanm@comcast.net Manual^Ordway
30^Memorial^Drive,^Avon^MA^2322 skajan@verizon.net Yuonne^Cajigas
250^Hartford^Avenue,^Bellingham^MA^2019 barnett@hotmail.com Pattie^Darsey
700^Oak^Street,^Brockton^MA^2301 sbmrjbr@sbcglobal.net Cammie^Knoles
66-4^Parkhurst^Rd,^Chelmsford^MA^1824 mastinfo@sbcglobal.net Evia^Fallen
591^Memorial^Dr,^Chicopee^MA^1020 carcus@aol.com Soo^Sanfilippo