id position
a1 21
a1 39
a1 77
b1 88
b1 122
c1 22
文件2
id class position1 position2
a1 Xfact 1 40
a1 Xred 41 66
a1 xbreak 69 89
b1 Xbreak 77 133
b1 Xred 140 199
c1 Xfact 1 15
c1 Xbreak 19 35
我想要这样的东西 输出:
id position class
a1 21 Xfact
a1 39 Xfact
a1 77 Xbreak
b1 88 Xbreak
b1 122 Xbreak
c1 22 Xbreak
我需要一个简单的awk脚本,它从file1打印id和位置,从file1获取位置并将其与文件2位置进行比较。如果文件1中的位置位于文件2中位置1和2的范围内。打印相应的课程
答案 0 :(得分:0)
使用awk
的一种方法。这不是一个简单的脚本。简要说明过程:关键点是变量'all_ranges',当重置从保存数据的范围文件中读取时,设置时,停止该过程并开始从'id-position'读取
file,检查数组数据中的位置,如果匹配范围则打印。我试图避免多次处理范围文件并通过块进行处理,这使得它更复杂。
编辑添加我假设两个文件中的id
字段都已排序。否则这个脚本会失败,你需要另一种方法。
script.awk
的内容:
BEGIN {
## Arguments:
## ARGV[0] = awk
## ARGV[1] = <first_input_argument>
## ARGV[2] = <second_input_argument>
## ARGC = 3
f2 = ARGV[ --ARGC ];
all_ranges = 0
## Read first line from file with ranges to get 'class' header.
getline line <f2
split( line, fields )
class_header = fields[2];
}
## Special case for the header.
FNR == 1 {
printf "%s\t%s\n", $0, class_header;
next;
}
## Data.
FNR > 1 {
while ( 1 ) {
if ( ! all_ranges ) {
## Read line from file with range positions.
ret = getline line <f2
## Check error.
if ( ret == -1 ) {
printf "%s\n", "ERROR: " ERRNO
close( f2 );
exit 1;
}
## Check end of file.
if ( ret == 0 ) {
break;
}
## Split line in spaces.
num = split( line, fields )
if ( num != 4 ) {
printf "%s\n", "ERROR: Bad format of file " f2;
exit 2;
}
range_id = fields[1];
if ( $1 == fields[1] ) {
ranges[ fields[3], fields[4] ] = fields[2];
continue;
}
else {
all_ranges = 1
}
}
if ( range_id == $1 ) {
delete ranges;
ranges[ fields[3], fields[4] ] = fields[2];
all_ranges = 0;
continue;
}
for ( range in ranges ) {
split( range, pos, SUBSEP )
if ( $2 >= pos[1] && $2 <= pos[2] ) {
printf "%s\t%s\n", $0, ranges[ range ];
break;
}
}
break;
}
}
END {
for ( range in ranges ) {
split( range, pos, SUBSEP )
if ( $2 >= pos[1] && $2 <= pos[2] ) {
printf "%s\t%s\n", $0, ranges[ range ];
break;
}
}
}
像以下一样运行:
awk -f script.awk file1 file2 | column -t
以下结果:
id position class
a1 21 Xfact
a1 39 Xfact
a1 77 xbreak
b1 88 Xbreak
b1 122 Xbreak
c1 22 Xbreak