删除R中数据框中的特定字符

时间:2015-03-05 13:53:45

标签: r substr gsub

我的数据框如下

>sample_df
dd_mav2_6541_0_10
dd_mav2_12567_0_2
dd_mav2_43_1_341
dd_mav2_19865_2_13
dd_mav2_1_0_1

我需要删除foruth“_”后面的所有数字。我希望输出如下

>sample_df
    dd_mav2_6541_0
    dd_mav2_12567_0
    dd_mav2_43_1
    dd_mav2_19865_2
    dd_mav2_1_0

我尝试了以下代码,但它只删除了特定数量的字符,但不像我上面提到的那样输出。

substr(sample_df,nchar(sample_df)-2,nchar(sample_df))

如何获得输出。

2 个答案:

答案 0 :(得分:2)

你可以试试这个:

gsub("_\\d+$","",sample_df)

它将删除下划线及其后面的任何数字(至少一个)数字,位于字符串的末尾。

使用您的数据:

sample_df <- c("dd_mav2_6541_0_10","dd_mav2_12567_0_2","dd_mav2_43_1_341","dd_mav2_19865_2_13","dd_mav2_1_0_1")

gsub("_\\d+$","",sample_df)
#[1] "dd_mav2_6541_0"  "dd_mav2_12567_0" "dd_mav2_43_1"    "dd_mav2_19865_2" "dd_mav2_1_0"

答案 1 :(得分:0)

# Create the vector (I added one more element 
# at the end, with less than 4 pieces)
sample_df <- c("dd_mav2_6541_0_10",
               "dd_mav2_12567_0_2",
               "dd_mav2_43_1_341",
               "dd_mav2_19865_2_13",
               "dd_mav2_1_0_1",
               "dd_mav2")

# Split by "_"
xx <- strsplit(x = sample_df, split = "_")
xx

[[1]]
[1] "dd_mav2_6541_0"

[[2]]
[1] "dd_mav2_12567_0"

[[3]]
[1] "dd_mav2_43_1"

[[4]]
[1] "dd_mav2_19865_2"

# Loop through each element and reconnect the pieces
yy <- lapply(xx, function(a) {
  if(length(a) < 4) {
    return(paste(a, collapse = "_"))
  } else {
    return(paste(a[1:4], collapse = "_"))
  }
})
yy

[[1]]
[1] "dd_mav2_6541_0"

[[2]]
[1] "dd_mav2_12567_0"

[[3]]
[1] "dd_mav2_43_1"

[[4]]
[1] "dd_mav2_19865_2"

# Re-create teh vector
do.call("c", yy)

[1] "dd_mav2_6541_0"  "dd_mav2_12567_0" "dd_mav2_43_1"   
    "dd_mav2_19865_2" "dd_mav2_1_0"     "dd_mav2"