我的数据框:
>datasetM
Mean
ENSORLG00000001933:tex11 2500.706
ENSORLG00000010797: 44225.330
ENSORLG00000003008:pabpc1a 11788.555
ENSORLG00000001973:sept6 3100.493
ENSORLG00000000997: 5418.796
需要输出:
>out
[1] "tex11" "ENSORLG00000010797" "pabpc1a" "sept6" "ENSORLG00000000997"
我尝试了这个,但我只检索分隔符之前的部分:
titles <- rownames(datasetM)
vapply(strsplit(titles,":"), `[`, 1, FUN.VALUE=character(1))
注意:ENS000的替代品中没有逻辑:名称和ENS00:
注2:ENSOR是rownames
注3:当“:”之后没有任何内容时,我想要ENSOR
答案 0 :(得分:3)
以下是基础R 的解决方案:
sapply(strsplit(rownames(df), ":"), function(x) x[length(x)])
# [1] "tex11" "ENSORLG00000010797" "pabpc1a" "sept6"
# [5] "ENSORLG00000000997"
使用sub
的另一个解决方案可能更简单:
sub("^\\w+:(?=\\w)|:", "", rownames(df), perl = TRUE)
# [1] "tex11" "ENSORLG00000010797" "pabpc1a" "sept6"
# [5] "ENSORLG00000000997"
数据:强>
df = read.table(text = " Mean
ENSORLG00000001933:tex11 2500.706
ENSORLG00000010797: 44225.330
ENSORLG00000003008:pabpc1a 11788.555
ENSORLG00000001973:sept6 3100.493
ENSORLG00000000997: 5418.796", header = TRUE, row.names = 1)
答案 1 :(得分:2)
以下是使用正则表达式(取自here)来识别每个rowname的最后一个字符的矢量化方法,
rownames(df)[!sub('.*(?=.$)', '', rownames(df), perl=TRUE) == ':'] <-
sub('.*:', '', rownames(df)[!sub('.*(?=.$)', '', rownames(df), perl=TRUE) == ':'])
给出,
V2 tex11 2500.706 ENSORLG00000010797: 44225.330 pabpc1a 11788.555 sept6 3100.493 ENSORLG00000000997: 5418.796
数据强>
dput(df)
structure(list(V2 = c(2500.706, 44225.33, 11788.555, 3100.493,
5418.796)), .Names = "V2", row.names = c("tex11", "ENSORLG00000010797:",
"pabpc1a", "sept6", "ENSORLG00000000997:"), class = "data.frame")
注意您可以通过
删除rownames中的冒号rownames(df) <- sub(':', '', rownames(df))