我可以向您介绍一个破坏我周末的问题。我有4列生物学数据
@ID:::12345/1 ACGACTACGA text !"#$%vwxyz
@ID:::12345/2 TATGACGACTA text :;<=>?VWXYZ
我想使用awk编辑第一列来替换字符:和/ with -
我想用逗号分隔的小数字串转换最后一列中的字符串,这些小数字对应于每个单独的ASCII字符(任何字符,范围从ASCII 33 - 126)。
@ID---12345-1 ACGACTACGA text 33,34,35,36,37,118,119,120,121,122
@ID---12345-2 TATGACGACTA text 58,59,60,61,62,63,86,87,88,89,90
第一部分很简单,但我坚持第二部分。我尝试过使用awk序数函数和sprintf;我只能让前者处理字符串中的第一个字符,我只能得到后者将十六进制转换为十进制而不是空格。还尝试了bash功能
$ od -t d1 test3 | awk 'BEGIN{OFS=","}{i = $1; $1 = ""; print $0}'
但不知道如何在awk中调用此函数。 我更喜欢使用awk,因为我有一些下游操作也可以在awk中完成。
非常感谢提前
答案 0 :(得分:1)
使用awk manual中的序数函数,您可以这样做:
awk -f ord.awk --source '{
# replace : with - in the first field
gsub(/:/,"-",$1)
# calculate the ordinal by looping over the characters in the fourth field
res=ord($4)
for(i=2;i<=length($4);i++) {
res=res","ord(substr($4,i))
}
$4=res
}1' file
输出:
@ID---12345/1 ACGACTACGA text 33,34,35,36,37,118,119,120,121,122
@ID---12345/2 TATGACGACTA text 58,59,60,61,62,63,86,87,88,89,90
以下是ord.awk
(取自:http://www.gnu.org/software/gawk/manual/html_node/Ordinal-Functions.html)
# ord.awk --- do ord and chr
# Global identifiers:
# _ord_: numerical values indexed by characters
# _ord_init: function to initialize _ord_
BEGIN { _ord_init() }
function _ord_init( low, high, i, t)
{
low = sprintf("%c", 7) # BEL is ascii 7
if (low == "\a") { # regular ascii
low = 0
high = 127
} else if (sprintf("%c", 128 + 7) == "\a") {
# ascii, mark parity
low = 128
high = 255
} else { # ebcdic(!)
low = 0
high = 255
}
for (i = low; i <= high; i++) {
t = sprintf("%c", i)
_ord_[t] = i
}
}
function ord(str, c)
{
# only first character is of interest
c = substr(str, 1, 1)
return _ord_[c]
}
function chr(c)
{
# force c to be numeric by adding 0
return sprintf("%c", c + 0)
}
如果您不想包含整个ord.awk
,可以这样做:
awk 'BEGIN{ _ord_init()}
function _ord_init(low, high, i, t)
{
low = sprintf("%c", 7) # BEL is ascii 7
if (low == "\a") { # regular ascii
low = 0
high = 127
} else if (sprintf("%c", 128 + 7) == "\a") {
# ascii, mark parity
low = 128
high = 255
} else { # ebcdic(!)
low = 0
high = 255
}
for (i = low; i <= high; i++) {
t = sprintf("%c", i)
_ord_[t] = i
}
}
{
# replace : with - in the first field
gsub(/:/,"-",$1)
# calculate the ordinal by looping over the characters in the fourth field
res=_ord_[substr($4,1,1)]
for(i=2;i<=length($4);i++) {
res=res","_ord_[substr($4,i,1)]
}
$4=res
}1' file
答案 1 :(得分:0)
Perl soltuion:
perl -lnae '$F[0] =~ s%[:/]%-%g; $F[-1] =~ s/(.)/ord($1) . ","/ge; chop $F[-1]; print "@F";' < input
第一个替换用短划线替换第一个字段中的:
和/
,第二个替换用其ord和逗号替换最后一个字段中的每个字符,chop
删除最后一个逗号。