我有一个功能,可以根据现有列向数据框添加新列。我的代码目前看起来像:
df <- data.frame("chr" = c("chr1", "chr2", "chr3", "chrX"), "B" = c("a", "c", "d", "b"))
df$chr <- factor(df$chr, levels = c("chr1", "chr2", "chr3", "chrX")) # Not really necessary here...
我使用以下函数添加一个带有染色体数字整数值的新列。我想知道是否有更简单的方法来做到这一点,也许利用因子水平。同时用整数值替换当前的df $ chr列也可以。
AddChr <- function(DataFrame){
DataFrame$Chr <- NA
DataFrame$Chr[DataFrame$chr == "chr1"] <- 1
DataFrame$Chr[DataFrame$chr == "chr2"] <- 2
DataFrame$Chr[DataFrame$chr == "chr3"] <- 3
DataFrame$Chr[DataFrame$chr == "chrX"] <- 20
DataFrame$Chr <- as.integer(DataFrame$Chr)
return(DataFrame)
}
df <- AddChr(df)
答案 0 :(得分:2)
此解决方案创建一个命名向量,将您的标签转换为新标签。
您希望最后将数字1到21作为标签:1:21
您要翻译的名称是字符chr
,后跟c(1:19, "X", "Y")
。
paste0("chr", c(1:19, "X", "Y"))
# [1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10"
# [11] "chr11" "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chrX"
# [21] "chrY
如果使用第二个向量命名第一个向量,则会得到映射:
setNames(1:21, paste0("chr", c(1:19, "X", "Y")))
# chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14
# chr15 chr16 chr17 chr18 chr19 chrX chrY
# 15 16 17 18 19 20 21
然后用你的矢量子集:
# setNames(1:21, paste0("chr", c(1:19, "X", "Y")))[df$chr]
# chr1 chr2 chr3 chr4
# 1 2 3 4
答案 1 :(得分:1)
对于您的具体示例,这将起作用
df$Chr <- ifelse(grepl("\\d", df$chr), gsub("[[:alpha:]]", "", df$chr), 20)
df
## chr B Chr
## 1 chr1 a 1
## 2 chr2 c 2
## 3 chr3 d 3
## 4 chrX b 20