用于合并3个文件的Shell脚本

时间:2018-07-11 16:47:05

标签: bash shell awk

我有3个具有以下数据的文件

$cat File1.txt
Apple,May
Orange,June
Mango,July

$cat File2.txt
Apple,Jan
Grapes,June

$cat File3.txt
Apple,March
Mango,Feb
Banana,Dec

我需要以下输出文件。

$Output_file.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec

这里的要求是取出第一列,然后需要搜索每个文件中第一列的公共数据,第二列必须为“ |”分开。如果没有公共列,则需要在输出文件中打印相同的列。

我尝试将其放入while循环中,但是随着文件大小的增加,这会花费一些时间。想要使用shell脚本的简单解决方案。

3 个答案:

答案 0 :(得分:1)

这应该有效:

#!/bin/bash
for FRUIT in $( cat "$@" | cut -d "," -f 1 | sort | uniq )
do
    echo -ne "${FRUIT},"
    awk -F "," "\$1 == \"$FRUIT\" {printf(\"%s|\",\$2)}" "$@" | sed 's/.$/\'$'\n/'
done

以:

运行
$ ./script.sh File1.txt File2.txt File3.txt

答案 1 :(得分:1)

一个纯本地bash解决方案(无需调用任何外部工具,因此仅受bash本身的性能限制)可能如下所示:

#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4 or newer required" >&2; exit 1;; esac

declare -A items=( )
for file in "$@"; do
  while IFS=, read -r key value; do
    items[$key]+="|$value"
  done <"$file"
done

for key in "${!items[@]}"; do
  value=${items[$key]}
  printf '%s,%s\n' "$key" "${value#'|'}"
done

...称为./yourscript File1.txt File2.txt File3.txt

答案 2 :(得分:1)

使用单个awk命令即可​​轻松完成此操作:

awk 'BEGIN{FS=OFS=","} {a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i in a) print i, a[i]}' File{1,2,3}.txt

Orange,June
Banana,Dec
Apple,May|Jan|March
Grapes,June
Mango,July|Feb

如果要以与原始文件中出现的字符串相同的顺序输出,请使用以下awk

awk 'BEGIN{FS=OFS=","} !($1 in a) {b[++n] = $1}
{a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i=1; i<=n; i++) print b[i], a[b[i]]}' File{1,2,3}.txt

Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec