R - 使用正则表达式删除2个字符或更少的所有字符串

时间:2017-03-13 18:54:17

标签: r regex string

我遇到了问题,我确信它的修复非常简单,但我一直在寻找大约一个小时的答案,似乎无法解决问题。

我有一个字符向量,其数据看起来有点像这样:

  [5] "Toronto, ON"                    "Manchester, UK"                    
  [7] "New York City, NY"              "Newark, NJ"             
  [9] "Melbourne"                      "Los Angeles, CA"                         
 [11] "New York, USA"                  "Liverpool, England"            
 [13] "Fort Collins, CO"               "London, UK"                              
 [15] "New York, NY" 

基本上我想摆脱2位或更短的所有字符元素,以便数据看起来如下:

  [5] "Toronto, "                      "Manchester, "                    
  [7] "New York City, "                "Newark, "             
  [9] "Melbourne"                      "Los Angeles, "                         
 [11] "New York, USA"                  "Liverpool, England"            
 [13] "Fort Collins, "                 "London, "                              
 [15] "New York, " 

我知道如何摆脱的逗号。正如我所说的,我确信这非常简单,任何帮助都会非常感激。谢谢!

2 个答案:

答案 0 :(得分:5)

您可以对带有字边界的单词字符staging使用量词,\\w将匹配带有一个或两个字符的单词;如果您有多个匹配的模式,请使用 gsub 删除它:

\\b\\w{1,2}\\b

注意gsub("\\b\\w{1,2}\\b", "", v) # [1] "Toronto, " "Manchester, " "New York City, " "Newark, " "Melbourne" "Los Angeles, " "New York, USA" # [8] "Liverpool, England" "Fort Collins, " "London, " "New York, " 匹配字母和带有下划线的数字,如果您只想考虑字母字母,可以使用\\w

gsub("\\b[a-zA-Z]{1,2}\\b", "", v)

答案 1 :(得分:0)

不使用正则表达式,但它完成了工作:

invoiceListTemplate+=`<tr width=100%>
 <td>${a.Id}</td> 
 <td>${a.Dt}</td>
 <td>${a.Dt}</td>
 <td>${a.Amt}</td>
 <td>${a.CurrencyCd}</td>
 <td>${a.OId}</td></tr>`;

Uses this to display:  $("#invoiceListOutput").html(invoiceListTemplate);

124124142124-235325 2016-10-07  2016-10-07  -5551.86    USD   0000100738
TEST-2332432-SDFSF  2016-10-06  2016-11-05  200         USD    **null**
xml with s          2016-10-05  2016-10-05  100         USD   0000105153