我有以下字符串:
x <- "\n\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n\t\t\t\t\t\n\t\t\tGEO Publications\n\t\t\t\t\tHandout\n\t\t\t\t\t\tNAR 2013 (latest)\n\t\t\t\t\t\tNAR 2002 (original)\n\t\t\t\t\t\tAll publications\n\t\t\t\t\t\n\t\t\t\tFAQ\n\t\t\t\tMIAME\n\t\t\t\tEmail GEO\n\t\t\t\n \n \n \n \n \n \n NCBI > GEO > Accession Display\nNot logged in | Login\n\n \n \n \n \n \n \n \n \n\n \n \n\nGEO help: Mouse over screen elements for information.\n\nScope: SelfPlatformSamplesSeriesFamily\n Format: HTMLSOFTMINiML\n Amount: BriefQuick\n GEO accession: \n\n\n\n Sample GSM935277\n\nQuery DataSets for GSM935277\nStatus\nPublic on May 22, 2012\nTitle\nStanford_ChipSeq_GM12878_TBP_IgG-mus\nSample type\nSRA\n \n\nSource name\nGM12878\nOrganism\nHomo sapiens\nCharacteristics\nlab: Stanfordlab description: Snyder - Stanford Universitydatatype: ChipSeqdatatype description: Chromatin IP Sequencingcell: GM12878cell organism: humancell description: B-lymphocyte, lymphoblastoid, International HapMap Project - CEPH/Utah - European Caucasion, Epstein-Barr Viruscell karyotype: normalcell lineage: mesodermcell sex: Ftreatment: Nonetreatment description: No special treatment or protocol appliesantibody: TBPantibody antibodydescription: Mouse monoclonal. Immunogen is synthetic peptide conjugated to KLH derived from within residues 1 - 100 of HumanTATA binding protein TBP. Antibody Target: TBPantibody targetdescription: General transcription factor that functions at the core of the DNA-binding multiprotein factor TFIID. Binding of TFIID to the TATA box is the initial transcriptional step of the pre-initiation complex (PIC), playing a role in the activation of eukaryotic genes transcribed by RNA polymerase II."
我想做的就是检测这种形式的图案:
Antibody Target: TBPantibody
并返回子字符串结果TBPantibody
。
我尝试过此正则表达式,但不起作用
sub("Antibody Target: ([A-Zaz]+)\\W+", "\\1", x)
正确的方法是什么?
答案 0 :(得分:2)
你可以做
sub(".*Antibody Target: ([A-Za-z]+).*", "\\1", x)
#[1] "TBPantibody"
答案 1 :(得分:2)
请您尝试一次。
sub("(.*Antibody Target: )([^ ]*)",\\2,variable)
说明:根据OP的示例值,此处存储在名为variable
的变量中。在这里使用sub
替代Base R的功能。
sub
的语法:
sub(/ regex_to_match /,“ get_value_two_from_memory_of_matched_regex或 将新的变量/值放在匹配的位置 部分”,variable_name_需要处理)
"(.*Antibody Target: )([^ ]*)"
:首先提到正则表达式,它从变量值的开始到字符串Antibody Target:
匹配,并将其保存在R程序的内存中((....)
表示所提及的正则表达式匹配在第二个(..)
中提到正则表达式以保持所有内容直到出现第一个空格为止。然后\\2
意味着用内存中的第二部分替换整个变量值(应与Antibody之后的字符串匹配。)。 / p>