我有一个名为deseq.res
的数据。它有一列叫做基因。如果值的长度超过10个字符,我想删除此列中的值。
deseq.res
deseq.res<-structure(list(Gene = c("SS1G_0300902", "SS1G_024991", "SS1G_09248",
"SS1G_09768"), sampleA = c("Healthy", "Healthy", "Healthy", "Healthy"
), sampleB = c("Infected", "Infected", "Infected", "Infected"
)), .Names = c("Gene", "sampleA", "sampleB"), row.names = c(NA,
4L), class = "data.frame")
我想要的结果:
Gene sampleA sampleB
SS1G_03009 Healthy Infected
SS1G_02499 Healthy Infected
SS1G_09248 Healthy Infected
SS1G_09768 Healthy Infected
我尝试过的代码:
这是我遇到的麻烦,然后我可以简单地使用gsub或substring。我可以用更精细的方式做到这一点,但我想使用函数来做到这一点。
check.len<- function(x){if (length(deseq.res$Gene[x])>10) return (x)}
check.len(deseq.res$Gene)
答案 0 :(得分:4)
我们可以使用substr
提取数据的前10个字符子串
deseq.res$Gene <- substr(deseq.res$Gene, 1, 10)
基于OP的功能,它是nchar
而不是length
check.len <- function(x, n) ifelse(nchar(x) > n, substr(x, 1, n) , x)
check.len(deseq.res$Gene, n = 10)
答案 1 :(得分:0)
您可以使用library(dplyr)进行突变:
library(dplyr)
deseq.res <- deseq.res %>% mutate(Gene = substr(Gene,1,10))