我想将线性表转换为矩阵格式。
我的输入表如下所示,名为“linear_table.tab”:
transcript ortho
Transcript_1 ORTHO_1
Transcript_2 ORTHO_2
Transcript_3 ORTHO_3
Transcript_4 ORTHO_4
Transcript_5 ORTHO_5
Transcript_6 ORTHO_6
Transcript_7 ORTHO_5
Transcript_8 ORTHO_1
Transcript_9 ORTHO_4
Transcript_10 ORTHO_5
Transcript_11 ORTHO_2
Transcript_12 ORTHO_7
Transcript_13 ORTHO_8
Transcript_14 ORTHO_5
Transcript_15 ORTHO_2
Transcript_16 ORTHO_9
我希望矩阵表看起来像:
Transcript_1 Transcript_2 Transcript_3 Transcript_4 Transcript_5 Transcript_6 Transcript_7 Transcript_8 Transcript_9 Transcript_10 Transcript_11 Transcript_12 Transcript_13 Transcript_14 Transcript_15 Transcript_16
Transcript_1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Transcript_2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0
Transcript_3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Transcript_4 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Transcript_5 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0
Transcript_6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Transcript_7 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Transcript_8 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Transcript_9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Transcript_10 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Transcript_11 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Transcript_12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Transcript_13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Transcript_14 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
Transcript_15 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Transcript_16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
这是我使用R的代码:
linear.table <- read.table("linear_table.tab", header=T, sep="\t")
library(reshape2)
dcast(linear.table, transcript~ortho, fill=0)
我在R中得到以下输出:
transcript ORTHO_1 ORTHO_2 ORTHO_3 ORTHO_4 ORTHO_5 ORTHO_6 ORTHO_7 ORTHO_8 ORTHO_9
Transcript_1 ORTHO_1 0 0 0 0 0 0 0 0
Transcript_10 0 0 0 0 ORTHO_5 0 0 0 0
Transcript_11 0 ORTHO_2 0 0 0 0 0 0 0
Transcript_12 0 0 0 0 0 0 ORTHO_7 0 0
Transcript_13 0 0 0 0 0 0 0 ORTHO_8 0
Transcript_14 0 0 0 0 ORTHO_5 0 0 0 0
Transcript_15 0 ORTHO_2 0 0 0 0 0 0 0
Transcript_16 0 0 0 0 0 0 0 0 ORTHO_9
Transcript_2 0 ORTHO_2 0 0 0 0 0 0 0
Transcript_3 0 0 ORTHO_3 0 0 0 0 0 0
Transcript_4 0 0 0 ORTHO_4 0 0 0 0 0
Transcript_5 0 0 0 0 ORTHO_5 0 0 0 0
Transcript_6 0 0 0 0 0 ORTHO_6 0 0 0
Transcript_7 0 0 0 0 ORTHO_5 0 0 0 0
Transcript_8 ORTHO_1 0 0 0 0 0 0 0 0
Transcript_9 0 0 0 ORTHO_4 0 0 0 0 0
我不知道如何使用R.
进行这方面的工作答案 0 :(得分:0)
使用awk
:
$ cat ortho.awk
NR > 1 {
transcript = $1;
ortho = $2;
i = transcript;
j = ortho;
sub("Transcript_", "", i);
sub("ORTHO_", "", j);
imx[i][j] = 1;
}
END {
for (i in imx) {
for (j in imx) {
omx["Transcript_"+i]["Transcript_"+j] = imx[i][j] == "" ? 0 : 1;
}
}
printf("\t");
for (i in omx) {
printf "\tTranscript%d", i;
}
print "";
for (i in omx) {
printf "Transcript%d", i;
for (j in omx) {
printf "\t%d", omx[i][j];
}
print "";
}
}
想法是填充1的稀疏矩阵,然后在最后用缺失的点填充0。然后打印出来。