我有两个文件:
adjective,adverb,participle,verb
0,2,3,5,
1,2,5,6
和
adjective,adjunct,adverbial,participle,verb
0,2,3,5,4
1,2,5,6,5
1,2,5,6,5
我想得到这样的输出:
adjective,adjunct,adverb,adverbial,participle,verb
0,2,0,3,5,4
1,2,0,5,6,5
1,2,0,5,6,5
以便根据标题合并列并按字母顺序排序。我不关心保留添加列中第二个文件的数字,它们可以用0填充。重要的是添加缺少的列并按字母顺序对它们进行排序。 加入没有帮助,因为它只加入一列。有什么想法吗?
答案 0 :(得分:5)
我不明白为什么join
不是一个选项:
join -t, -a 1 -o 0,2.2,1.2,2.3,1.3,2.5 file1 file2
adjective,adjunct,adverb,adverbial,participle,verb
0,2,2,3,3,4
1,2,2,5,5,5
1,2,2,5,5,5
-a
为每个文件指定了连接字段,-o
指定了输出格式(来自哪个文件的字段)
我可能会在稍后再回过头来。在此期间,您可以像这样提取合并的列标题:
paste -d , file1 file2 | sed 1q | tr , '\n' | sed 's/ *$//' | sort -u | paste -d, -s
adjective,adjunct,adverb,adverbial,participle,verb
好的,只有GNU awk答案:
PROCINFO["sorted_in"]
特性按索引的词法排序顺序遍历关联数组gawk -F, '
NR == 1 {
n = split($0, f1cols, /,/)
for (i=1; i<=n; i++)
allcols[f1cols[i]] = 1
}
NR == FNR {next} # because you do not care about the values
FNR == 1 {
n = split($0, f2cols, /,/)
for (i=1; i<=n; i++) {
allcols[f2cols[i]] = 1
f2colidx[f2cols[i]] = i
}
PROCINFO["sorted_in"] = "@ind_str_asc"
sep = ""
for (head in allcols) {
printf "%s%s", sep, head
sep = FS
}
print ""
next
}
{
sep = ""
for (col in allcols) {
val = (col in f2colidx) ? $(f2colidx[col]) : 0
printf "%s%s", sep, val
sep = FS
}
print ""
}
' file1 file2
adjective,adjunct,adverb,adverbial,participle,verb
0,2,0,3,5,4
1,2,0,5,6,5
1,2,0,5,6,5
答案 1 :(得分:0)
我使用了与this类似的解决方案。 不知何故,我设法将awk应用于它,它似乎做了我想要的。
head -1 -q annotation1.csv annotation2.csv | tr , "\n" | sort | uniq > header.txt
header="header.txt"
awk -F, -v colsFile="$header" -v OFS=',' 'BEGIN {
j=1
while ((getline < colsFile) > 0) col[j++] = $1
n=j-1;
close(colsFile)
for (i=1; i<=n; i++) {
s[col[i]]=i
printf(col[i])","
}
print""
}
NR==1 {
for (f=1; f<=NF; f++) c[s[$f]]=f
next
}
{
for (f=1; f<=n; f++)
if (c[f] == "") {printf 0","} else printf $(c[f])","
print ""
}' $1