我有一个teb分隔的文件,在一列中有基因名称,在另一列中有这些基因的表达值。我想用grep从这个文件中删除某些基因。所以,这个:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
成为这个:
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
我使用了以下命令,并将其与我有限的终端知识放在一起:
grep -iv sno* <input.text> | grep -iv rp* | grep -iv U6* | grep -iv 7SK* > <output.txt>
所以使用这个命令,我的输出文件缺少以sno,u6和7sk开头的基因,但不知何故,grep删除了所有含有“r”的基因,而不是以“rp”开头的基因。我对此非常困惑。任何想法为什么sno *工作,但rp *不?
谢谢!
答案 0 :(得分:0)
虽然这不能直接回答你的问题,但你的示例命令行中有一件事你可能要小心:每当你使用特殊的shell元字符(比如“*
”)时,你需要逃避或引用它。所以你的命令行看起来应该更像:
grep -iv 'sno*' <input.text> | grep -iv 'rp*' | grep -iv 'U6*' | grep -iv '7SK*' > <output.txt>
通常,shell是聪明的,如果没有文件匹配glob,它们将按原样使用文本(所以如果你输入“foo *”但是没有以“foo”开头的文件名,那么字符串“foo” *“将传递给命令。”
答案 1 :(得分:0)
grep -iEv "sno|rp|U6|7SK" yourInput
<强>试验:强>
kent$ cat b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
"42266" "snoMBII-202" "0"
"42267" "snoMBII-202" "0"
"42268" "snoMe28S-Am2634" "0"
"42269" "snoMe28S-Am2634" "0"
"42270" "snoR26" "0"
"42271" "SNORA1" "0"
"42272" "SNORA1" "0"
kent$ grep -iEv "sno|rp|U6|7SK" b
"42261" "SNHG7" "20.2678"
"42262" "SNHG8" "25.3981"
"42263" "SNHG9" "0.488534"
"42264" "SNIP1" "7.35454"
"42265" "SNN" "2.05365"
答案 2 :(得分:0)
grep
命令使用正则表达式,而不是通用模式。
模式rp*
表示“'r'后跟零或更多'p'”。您真正想要的是rp.*
,甚至更好,"rp.*
(或者甚至只是"rp
,在“rp”之后尝试grep是没有意义的“ 毕竟)。同样,sno*
表示“'sn'后跟零或更多'o'”。同样,您需要sno.*
或"sno.*
(甚至只需"sno
)。