我有3个具有以下数据的文件
$cat File1.txt
Apple,May
Orange,June
Mango,July
$cat File2.txt
Apple,Jan
Grapes,June
$cat File3.txt
Apple,March
Mango,Feb
Banana,Dec
我需要以下输出文件。
$Output_file.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec
这里的要求是取出第一列,然后需要搜索每个文件中第一列的公共数据,第二列必须为“ |”分开。如果没有公共列,则需要在输出文件中打印相同的列。
我尝试将其放入while循环中,但是随着文件大小的增加,这会花费一些时间。想要使用shell脚本的简单解决方案。
答案 0 :(得分:1)
这应该有效:
#!/bin/bash
for FRUIT in $( cat "$@" | cut -d "," -f 1 | sort | uniq )
do
echo -ne "${FRUIT},"
awk -F "," "\$1 == \"$FRUIT\" {printf(\"%s|\",\$2)}" "$@" | sed 's/.$/\'$'\n/'
done
以:
运行$ ./script.sh File1.txt File2.txt File3.txt
答案 1 :(得分:1)
一个纯本地bash解决方案(无需调用任何外部工具,因此仅受bash本身的性能限制)可能如下所示:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4 or newer required" >&2; exit 1;; esac
declare -A items=( )
for file in "$@"; do
while IFS=, read -r key value; do
items[$key]+="|$value"
done <"$file"
done
for key in "${!items[@]}"; do
value=${items[$key]}
printf '%s,%s\n' "$key" "${value#'|'}"
done
...称为./yourscript File1.txt File2.txt File3.txt
答案 2 :(得分:1)
使用单个awk
命令即可轻松完成此操作:
awk 'BEGIN{FS=OFS=","} {a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i in a) print i, a[i]}' File{1,2,3}.txt
Orange,June
Banana,Dec
Apple,May|Jan|March
Grapes,June
Mango,July|Feb
如果要以与原始文件中出现的字符串相同的顺序输出,请使用以下awk
:
awk 'BEGIN{FS=OFS=","} !($1 in a) {b[++n] = $1}
{a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i=1; i<=n; i++) print b[i], a[b[i]]}' File{1,2,3}.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec