从标点符号外删除文本中的所有字符

时间:2018-04-22 21:31:37

标签: r

我有一个具有以下内容的数据集:

ID    Type                 Count
1     **Radisson**             8
2     **Renaissance**          9
3     **Hilton**               8
4     **Radisson**             8

我想获得一个看起来像

的数据集
public delegate void InvoiceOrderDelegate(DateTime invoiceDate, PXResult<SOOrderShipment, SOOrder, CurrencyInfo, SOAddress, SOContact, SOOrderType> order, PXResultset<SOShipLine, SOLine> details, Customer customer, DocumentList<ARInvoice, SOInvoice> list);

[PXOverride]
public virtual void InvoiceOrder(DateTime invoiceDate, PXResult<SOOrderShipment, SOOrder, CurrencyInfo, SOAddress, SOContact, SOOrderType> order, PXResultset<SOShipLine, SOLine> details, Customer customer, DocumentList<ARInvoice, SOInvoice> list, InvoiceOrderDelegate baseMethod)
{
    //Do Stuff
}

甚至没有*,如果可能的话。

任何解决方案?

3 个答案:

答案 0 :(得分:3)

你可以在开头就把星星之间没有的东西分开。

df <- data.frame(Type = c("**Radisson**", "**Renaissance**", "**Hilton** New York Only",
                          "**Radisson** East Cost"),
                 Count = c(8, 9, 8, 8))

gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)

[1] "**Radisson**"    "**Renaissance**" "**Hilton**"      "**Radisson**" 

所以......

df$Type <- gsub("^(\\*{2}.*\\*{2}).*", "\\1", df$Type, perl = TRUE)
df

             Type Count
1    **Radisson**     8
2 **Renaissance**     9
3      **Hilton**     8
4    **Radisson**     8

答案 1 :(得分:0)

解决方案是在strsplit上使用**并选择第二个元素:

df$Type = sapply(strsplit(df$Type, split= "\\*{2}"), function(x)x[2])
df
#   ID        Type Count
# 1  1    Radisson     8
# 2  2 Renaissance     9
# 3  3      Hilton     8
# 4  4    Radisson     8

答案 2 :(得分:0)

以下是str_extract

的选项
library(stringr)
library(dplyr)
df %>% 
   mutate(Type = str_extract(Type, "[*]*[^*]*[*]*"))
#              Type Count
#1    **Radisson**     8
#2 **Renaissance**     9
#3      **Hilton**     8
#4    **Radisson**     8