remove all delimiters at beginning and end of string

时间:2016-10-20 19:52:22

标签: r regex

After I collapse my rows and separate using a semicolon, I'd like to delete the semicolons at the front and back of my string. Multiple semicolons represent blanks in a cell. For example an observation may look as follows after the collapse:

;TX;PA;CA;;;;;;;

I'd like the cell to look like this:

TX;PA;CA

Here is my collapse code:

new_df <- group_by(old_df, unique_id) %>% summarize_each(funs(paste(., collapse = ';')))

If I try to gsub for semicolon it removes all of them. If if I remove the end character it just removes one of the semicolons. Any ideas on how to remove all at the beginning and end, but leaving the ones in between the observations? Thanks.

2 个答案:

答案 0 :(得分:10)

use the regular expression ^;+|;+$

x <- ";TX;PA;CA;;;;;;;"
gsub("^;+|;+$", "", x)

The ^ indicates the start of the string, the + indicates multiple matches, and $ indicates the end of the string. The | states "OR". So, combined, it's searching for any number of ; at the start of a string OR any number of ; at the end of the string, and replace those with an empty space.

答案 1 :(得分:3)

stringi包允许您指定要保留的模式并修剪其他所有模式。如果你只有字母(虽然你也可以指定其他模式),你可以简单地做

stringi::stri_trim_both(";TX;PA;CA;;;;;;;", "\\p{L}")
## [1] "TX;PA;CA"