修剪数据框中字符串的一部分

时间:2015-02-18 16:14:53

标签: regex r replace trim

如果我有这样的数据帧结构:

AA1_123.zip
BB2_456.txt
CCC_789.doc

如何将其更改为:

AA1
BB2
CCC

3 个答案:

答案 0 :(得分:5)

您可以尝试sub

sub('_.*', '', df1$Col)
#[1] "AA1" "BB2" "CCC"

数据

df1 <- structure(list(Col = c("AA1_123.zip", "BB2_456.txt", 
"CCC_789.doc"
)), .Names = "Col", class = "data.frame", row.names = c(NA, -3L))

答案 1 :(得分:1)

如果字符串在开头都是相同的样式,下划线前有三个字符,这将起作用:

df1 <- structure(list(Col = c("AA1_123.zip", "BB2_456.txt", 
                              "CCC_789.doc"
)), .Names = "Col", class = "data.frame", row.names = c(NA, -3L))

> substr(df1$Col, 1, 3)
[1] "AA1" "BB2" "CCC"

答案 2 :(得分:1)

您还可以再次阅读该列,使用comment.char = "_"清除该行的其余部分。 ÿ

df <- data.frame(x = c("AA1_123.zip", "BB2_456.txt", "CCC_789.doc"))

read.table(text = as.character(df$x), comment.char="_")
#    V1
# 1 AA1
# 2 BB2
# 3 CCC

或者您可以使用scan()

scan(text = as.character(df$x), what = "", comment.char="_")
# Read 3 items
# [1] "AA1" "BB2" "CCC"