修剪列中的结尾连字符

时间:2017-01-16 02:40:52

标签: r string dataframe delimiter hyphen

我有一个data.frame列,如下所示:

Lake-and-Peninsula--
Matanuska-Susitna---
Nome----
North-Slope---
Northwest-Arctic---
Prince-of-Wales-Outer-
Sitka----
Skagway-Hoonah-Angoon--
Southeast-Fairbanks---
Valdez-Cordova---
Wade-Hampton---
Wrangell-Petersburg---
Yakutat----

每个单元格以一定数量的连字符结尾。我想删除单元格末尾的所有连字符,但在单词之间保留连字符。我怎样才能做到这一点?最多只有4个连字符,有时没有连字符。

期望的输出:

Lake-and-Peninsula
Matanuska-Susitna
Nome
North-Slope
Northwest-Arctic
Prince-of-Wales-Outer
Sitka
Skagway-Hoonah-Angoon
Southeast-Fairbanks
Valdez-Cordova
Wade-Hampton
Wrangell-Petersburg
Yakutat

2 个答案:

答案 0 :(得分:1)

我们可以使用sub匹配字符串末尾的一个或多个--+)($)并将其替换为空白

df1$Col <- sub("-+$", "", df1$Col)
df1
#                     Col
#1     Lake-and-Peninsula
#2      Matanuska-Susitna
#3                   Nome
#4            North-Slope
#5       Northwest-Arctic
#6  Prince-of-Wales-Outer
#7                  Sitka
#8  Skagway-Hoonah-Angoon
#9    Southeast-Fairbanks
#10        Valdez-Cordova
#11          Wade-Hampton
#12   Wrangell-Petersburg
#13               Yakutat

数据

df1 <- structure(list(Col = c("Lake-and-Peninsula--", "Matanuska-Susitna---", 
"Nome----", "North-Slope---", "Northwest-Arctic---", "Prince-of-Wales-Outer-", 
"Sitka----", "Skagway-Hoonah-Angoon--", "Southeast-Fairbanks---", 
"Valdez-Cordova---", "Wade-Hampton---", "Wrangell-Petersburg---", 
"Yakutat----")), .Names = "Col", class = "data.frame", row.names = c(NA, -13L))

答案 1 :(得分:0)

根据尾随连字符的数量,我猜测我们获取这些字符串的方式是因为初始数据帧中有一些空白单元格。然后我们将列粘贴到一个用连字符作为分隔符的列。

相反,在粘贴之前排除空白以避免这个额外的连字符问题,例如:

# data
x <- c("Lake", "and", "Peninsula", "", "")

# paste old
paste(x, collapse = "-")
# [1] "Lake-and-Peninsula--"

# paste after removing blanks
paste(x[ x != ""], collapse = "-")
# [1] "Lake-and-Peninsula"