Question

我的数据如下：

我想做两件事

1-自动更改列名称以存在多少列我知道我可以像

一样手动设置它

colnames(df) <- c("sample_1", "sample_2")

但是我想给出一个名字，然后根据订单自动为所有列添加一个数字。

2-我删除了我不想要的每个元素的部分。我知道我必须使用grep，但我无法弄清楚如果你知道如何做到这一点，我感谢你的解释

Answer 1

您可以尝试：

require(stringr)
data.frame(setNames(
           lapply(df,function(x) 
             vapply(str_extract_all(x,"(?<=sp\\|)[^\\|]*"),paste,collapse=";","")),
           paste0("sample_",seq_along(df))))

#        sample_1             sample_2
#1         Q9Y6Y8               Q9NZT1
#2         Q9Y6X4               Q5T749
#3         Q9Y6W5               Q13835
#4         Q9Y6V7               Q08554
#5         Q9Y6U3        P67809;Q9Y2T7
#6         Q9Y6M9 P42356;Q8N8J0;A4QPH2
#7  Q9Y6M4;Q9HCP0               P38117
#8         Q9Y6M1               P35908
#9         Q9Y6I3               P19338
#10 Q9Y6H1;Q5T1J5               P15924

Answer 2

在列中填充相应的gsub。 gsub模式匹配一串非分号后跟一个|然后是一串非分号，后跟一个|然后是一串非分号，并用括号中的匹配部分（捕获组）替换它。最后，我们将结果列表转换回数据框并设置名称。没有包使用。

L <- lapply(df, gsub, pattern = "[^;]+\\|([^;]+)\\|[^;]+", replacement = "\\1")
setNames(replace(df, TRUE, L), paste("sample", 1:ncol(df), sep = "_"))

，并提供：

        sample_1             sample_2
1         Q9Y6Y8               Q9NZT1
2         Q9Y6X4               Q5T749
3         Q9Y6W5               Q13835
4         Q9Y6V7               Q08554
5         Q9Y6U3        P67809;Q9Y2T7
6         Q9Y6M9 P42356;Q8N8J0;A4QPH2
7  Q9Y6M4;Q9HCP0               P38117
8         Q9Y6M1               P35908
9         Q9Y6I3               P19338
10 Q9Y6H1;Q5T1J5               P15924

注意：这可以像这样写。 ix是应该转换的列数的向量。其他人保持原样。

ix <- seq_along(df)
df2 <- df
df2[ix] <- lapply(df[ix], gsub, pattern = "[^;]+\\|([^;]+)\\|[^;]+", replacement = "\\1")
names(df2)[ix] <- paste("sample", ix, sep = "_") # omit if names need not be set

Answer 3

解析入藏号：

df$newcolumn <- sub("^[^|]+\\|([^|]+).*$", "\\1", df$Ratio.H.L.normalized)

更新：

df2 <- apply(df,2,function(col){ 
    return(gsub("sp\\||\\|[^|_;]+_[^|;]+", "", col, perl=TRUE)) 
})

这适用于所有列，并在输出中放置多个加入 - 按要求

sub在一个向量（或其他）中用一些其他东西替换一些匹配的东西。正则表达式：

^ matches start of string
[^|]+ matches one or more characters that are not a bar (sp or tr)
\\| matches the first bar
[^|]+ matches one or more characters that are not a bar (your accession)
([^|]+) the parentheses "save" the matched contents
.* matches the rest of the characters
$ matches the end of the string

\\1 retreives your "saved" match, i.e. the replacement is the accession

在这里，我将值写入新列，但如果您愿意，可以轻松覆盖该列。

重命名列

names(df) = paste("sample", 1:length(df), sep = "_")

paste将字符串放在一起形成一个更大的字符串，但它也在矢量上运行。在这种情况下，向量是从1到df的长度（列号）的数字。它粘贴＆＃34;样品＆＃34;在每个的前面，使用下划线作为分隔符。

如何删除许多列的部分元素？

3 个答案: