从列名中删除部分字符串

时间:2018-02-08 09:32:04

标签: r regex

这是一个数据:

structure(list(Fasta.headers = c("Person01050.1", "Person01080.1", 
                                 "Person01090.1", "Person01100.4", "Person01140.1", "Person01220.1"), 
               ToRemove.Gr_1 = c(0, 1107200, 17096000, 0, 0, 0), ToRemove.Gr_10 = c(0, 
                                                                                      37259000, 1104800000, 783870, 0, 1308600), ToRemove.Gr_11 = c(1835800, 
                                                                                                                                                     53909000, 623960000, 0, 0, 0), ToRemove.Gr_12 = c(0, 19117000, 
                                                                                                                                                                                                        808600000, 0, 0, 719400), ToRemove.Gr_13 = c(2544200, 2461400, 
                                                                                                                                                                                                                                                      418770000, 0, 0, 0), ToRemove.Gr_14 = c(5120400, 1373700, 
                                                                                                                                                                                                                                                                                               117330000, 0, 0, 0), ToRemove.Gr_15 = c(6623500, 0, 73336000, 
                                                                                                                                                                                                                                                                                                                                        0, 0, 0), ToRemove.Gr_16 = c(0, 0, 31761000, 0, 0, 0), ToRemove.Gr_17 = c(13475000, 
                                                                                                                                                                                                                                                                                                                                                                                                                    0, 29387000, 0, 0, 0), ToRemove.Gr_18 = c(7883300, 0, 27476000, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                               0, 0, 0), ToRemove.Gr_19 = c(82339000, 3254700, 50825000, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             0, 0, 0), ToRemove.Gr_2 = c(1584100, 84847000, 5219500000, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          6860700, 0, 8337700), ToRemove.Gr_20 = c(205860000, 0, 67685000, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    0, 0, 0), ToRemove.Gr_21 = c(867120000, 1984400, 2.26e+08, 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  0, 0, 10502000)), .Names = c("Fasta.headers", "ToRemove.Gr_1", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               "ToRemove.Gr_10", "ToRemove.Gr_11", "ToRemove.Gr_12", "ToRemove.Gr_13", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               "ToRemove.Gr_14", "ToRemove.Gr_15", "ToRemove.Gr_16", "ToRemove.Gr_17", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               "ToRemove.Gr_18", "ToRemove.Gr_19", "ToRemove.Gr_2", "ToRemove.Gr_20", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               "ToRemove.Gr_21"), row.names = c(NA, 6L), class = "data.frame")

由于列名已经表明部分" ToRemove"应该从名称中删除,只有Gr_ *应该留在后面。

我很欣赏这个问题的两个解决方案。首先根据指定的字符串,它应删除部分列名称或基于特定字符,例如.。它应该在点之前或之后删除整个部分。

1 个答案:

答案 0 :(得分:3)

我们可以使用sub

names(df1)[-1] <- sub(".*\\.", "", names(df1)[-1])

如果我们还需要.,请替换为.

names(df1)[-1] <- sub(".*\\.", ".", names(df1)[-1])

为了完全匹配模式,我们还可以匹配从字符串的开头([^.]*)开始(或^)的零个或多个字符,后跟一个点({{ 1}} - 转义点,因为它是暗示任何字符的元字符)并将其替换为空白(\\.

""

如上所述'ToRemove',

sub("^[^.]*\\.", "", names(df1)[-1])
#[1] "Gr_1"  "Gr_10" "Gr_11" "Gr_12" "Gr_13" "Gr_14" "Gr_15" "Gr_16" 
#[9] "Gr_17" "Gr_18" "Gr_19" "Gr_2"  "Gr_20" "Gr_21"

此外,如果我们需要删除所有字符,包括sub("ToRemove.", "", names(df1)[-1], fixed = TRUE)

.