需要匹配具有特定项R的向量中的每个项的结尾

时间:2017-07-21 20:42:36

标签: r

我有一个原始矢量,样本,并希望对其进行子集化,以便我有一个较小的向量,只包含以“01A”结尾的项。我尝试过使用grepl,grep和subsetting,但它给了我不正确的值。

代码如下所示:

samples <- lihc_data[0,-1]
as.vector(samples)
tum <- subset(lihc_data[0,-1], grepl("01A$", lihc_data[0,-1]) == TRUE)

这里有一些示例矢量的样子:

 [1] TCGA.BC.A10Q.11A TCGA.BC.A10Q.01A TCGA.DD.A1EB.11A TCGA.DD.A1EB.01A
 [5] TCGA.DD.A1EG.11A TCGA.DD.A1EG.01A TCGA.DD.A1EH.11A TCGA.DD.A1EH.01A
 [9] TCGA.DD.A1EI.11A TCGA.DD.A1EI.01A TCGA.DD.A3A6.11A TCGA.DD.A3A6.01A

3 个答案:

答案 0 :(得分:1)

lihc_data[0,-1]中的0很奇怪。在R中,与其他编程语言相比,你永远不会有0个索引。所以你的矢量可能是空的。如果你想要没有第一个元素的lihc_data data.frame或matrix的第一行,可以试试as.character(lihc_data[1,-1])

根据您的代码,这似乎有效:

samples <- c("TCGA.BC.A10Q.11A", "TCGA.BC.A10Q.01A", "TCGA.DD.A1EB.11A", 
             "TCGA.DD.A1EB.01A", "TCGA.DD.A1EG.11A", "TCGA.DD.A1EG.01A", 
             "TCGA.DD.A1EH.11A", "TCGA.DD.A1EH.01A", "TCGA.DD.A1EI.11A", 
             "TCGA.DD.A1EI.01A", "TCGA.DD.A3A6.11A", "TCGA.DD.A3A6.01A")

subset(samples, grepl("01A$", samples) == TRUE)

这可能更短,更惯用(同样的结果):

grep("01A$", samples, value = TRUE)
samples[grepl("01A$", samples)]

答案 1 :(得分:0)

<LinearLayout
            android:layout_width="match_parent"
            android:layout_height="wrap_content"
            android:gravity="center"
            android:weightSum="1"
            android:orientation="horizontal">
            <EditText
                android:id="@+id/edtTxtNameReg"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:maxLines="1"
                android:inputType="text"
                android:layout_weight="0.5"
                android:hint="Name"/>
            <EditText
                android:id="@+id/edtTxtSurNameReg"
                android:layout_width="wrap_content"
                android:layout_height="wrap_content"
                android:maxLines="1"
                android:inputType="text"
                android:layout_weight="0.5"
                android:hint="Surname"/>
        </LinearLayout>

结果:

samples <- c("TCGA.BC.A10Q.11A", "TCGA.BC.A10Q.01A", "TCGA.DD.A1EB.11A",
             "TCGA.DD.A1EB.01A", "TCGA.DD.A1EG.11A", "TCGA.DD.A1EG.01A", 
             "TCGA.DD.A1EH.11A", "TCGA.DD.A1EH.01A", "TCGA.DD.A1EI.11A",
             "TCGA.DD.A1EI.01A", "TCGA.DD.A3A6.11A", "TCGA.DD.A3A6.01A")

stringr::str_extract_all(samples, "[:print:]{12}\\.01A") %>% unlist()

答案 2 :(得分:0)

基础解决方案:

数据:

 c("TCGA.DD.A1EG.11A", "TCGA.DD.A1EI.11A", "TCGA.DD.A1EG.01A", 
"TCGA.DD.A1EI.01A", "TCGA.DD.A1EH.11A", "TCGA.DD.A3A6.11A", "TCGA.DD.A1EH.01A", 
"TCGA.DD.A3A6.01A") -> samples

然后使用简单的向量子集(不需要subset函数):

samples[grepl("01A$", samples)]