我有两个输入文件:
file1
1 982444
1 46658343
3 15498261
2 238295146
21 47423507
X 110961739
17 7490379
13 31850803
13 31850989
file2
1 982400 982480
1 46658345 46658350
2 14 109
2 5000 9000
2 238295000 238295560
X 110961739 120000000
17 7490200 8900005
这是我想要的输出:
Desired output:
1 982444
2 238295146
X 110961739
17 7490379
这就是我想要的:在file2的第1列中找到file1的第1列元素。如果数字相同,请取file1的第2列的数量,并检查它是否包含在file2的column2和3的数字范围内。如果包含它,则在输出中打印file1行。
理解可能有点令人困惑,但我正在尽我所能。我已经尝试了一些东西,但我离解决方案很远,任何帮助都会非常感激。请用bash,awk或perl。
提前致谢,
答案 0 :(得分:3)
只需使用awk
。该解决方案不会反复循环file1
。
#!/usr/bin/awk -f
NR == FNR {
# I'm processing file2 since NR still matches FNR
# I'd store the ranges from it on a[] and b[]
# x[] acts as a counter to the number of range pairs stored that's specific to $1
i = ++x[$1]
a[$1, i] = $2
b[$1, i] = $3
# Skip to next record; Do not allow the next block to process a record from file2.
next
}
{
# I'm processing file1 since NR is already greater than FNR
# Let's get the index for the last range first then go down until we reach 0.
# Nothing would happen as well if i evaluates to nothing i.e. $1 doesn't have a range for it.
for (i = x[$1]; i; --i) {
if ($2 >= a[$1, i] && $2 <= b[$1, i]) {
# I find that $2 is within range. Now print it.
print
# We're done so let's skip to the next record.
next
}
}
}
用法:
awk -f script.awk file2 file1
输出:
1 982444
2 238295146
X 110961739
17 7490379
使用Bash(版本4.0或更高版本)的类似方法:
#!/bin/bash
FILE1=$1 FILE2=$2
declare -A A B X
while read F1 F2 F3; do
(( I = ++X[$F1] ))
A["$F1|$I"]=$F2
B["$F1|$I"]=$F3
done < "$FILE2"
while read -r LINE; do
read F1 F2 <<< "$LINE"
for (( I = X[$F1]; I; --I )); do
if (( F2 >= A["$F1|$I"] && F2 <= B["$F1|$I"] )); then
echo "$LINE"
continue
fi
done
done < "$FILE1"
用法:
bash script.sh file1 file2
答案 1 :(得分:2)
让我们混合bash和awk:
while read col min max
do
awk -v col=$col -v min=$min -v max=$max '$1==col && min<=$2 && $2<=max' f1
done < f2
$ while read col min max; do awk -v col=$col -v min=$min -v max=$max '$1==col && min<=$2 && $2<=max' f1; done < f2
1 982444
2 238295146
X 110961739
17 7490379
答案 2 :(得分:0)
Pure bash,基于Fedorqui解决方案:
#!/bin/bash
while read col_2 min max
do
while read col_1 val
do
(( col_1 == col_2 && ( min <= val && val <= max ) )) && echo $col_1 $val
done < file1
done < file2
答案 3 :(得分:0)
cut -d' ' -f1 input2 | sed 's/^/^/;s/$/\\s/' | \
grep -f - <(cat input2 input1) | sort -n -k1 -k3 | \
awk 'NF==3 {
split(a,b,",");
for (v in b)
if ($2 <= b[v] && $3 >= b[v])
print $1, b[v];
if ($1 != p) a=""}
NF==2 {p=$1;a=a","$2}'
产地:
X 110961739
1 982444
2 238295146
17 7490379
答案 4 :(得分:0)
这是一个Perl解决方案。如果我用file2
构建一个哈希值,它可能会更快但更简洁,但这应该没问题。
use strict;
use warnings;
use autodie;
my @bounds = do {
open my $fh, '<', 'file2';
map [ split ], <$fh>;
};
open my $fh, '<', 'file1';
while (my $line = <$fh>) {
my ($key, $val) = split ' ', $line;
for my $bound (@bounds) {
next unless $key eq $bound->[0] and $val >= $bound->[1] and $val <= $bound->[2];
print $line;
last;
}
}
<强>输出强>
1 982444
2 238295146
X 110961739
17 7490379