R中的字符串重排

时间:2015-04-07 06:13:55

标签: regex r string gsub strsplit

我有一长串的城市名称和省名。这是我的部分数据列表

data <- c('Ranchi_Capital_State_Jharkhand', 'Bokaro_State_Jharkhand', 'Tata Nagar_State_Jharkhand', 'Ramgarh_State_Jharkhand',
      'Pune_State_Maharashtra', 'Mumbai_Capital_State_Maharashtra', 'Nagpur_State_Maharashtra')

我想安排它,以便国家应该首先,像State_Jharkhand_Bokaro。如果城市是首都,那么State_Jharkhand_Capital_Ranchi。另请注意,城市名称或州名称可能包含单个字符串或多个字符串(例如Tata Nagar)。

最有效的方法是什么,(不使用任何循环)?

2 个答案:

答案 0 :(得分:2)

您可以使用以下gsub功能。

> data <- c('Ranchi_Capital_State_Jharkhand', 'Bokaro_State_Jharkhand', 'Tata Nagar_State_Jharkhand', 'Ramgarh_State_Jharkhand',
+           'Pune_State_Maharashtra', 'Mumbai_Capital_State_Maharashtra', 'Nagpur_State_Maharashtra')
> gsub("^(?:(.*?)(_Capital))?(.*?)_(State.*)", "\\4\\2_\\1\\3", data)
[1] "State_Jharkhand_Capital_Ranchi"   "State_Jharkhand_Bokaro"          
[3] "State_Jharkhand_Tata Nagar"       "State_Jharkhand_Ramgarh"         
[5] "State_Maharashtra_Pune"           "State_Maharashtra_Capital_Mumbai"
[7] "State_Maharashtra_Nagpur" 

DEMO

答案 1 :(得分:1)

这并没有真正使用很多正则表达式,但主要是基于信息的预期位置。按“_”拆分字符串,然后根据需要重新排序:

data
# [1] "Ranchi_Capital_State_Jharkhand"   "Bokaro_State_Jharkhand"          
# [3] "Tata Nagar_State_Jharkhand"       "Ramgarh_State_Jharkhand"         
# [5] "Pune_State_Maharashtra"           "Mumbai_Capital_State_Maharashtra"
# [7] "Nagpur_State_Maharashtra"  

A <- strsplit(data, "_", TRUE)
sapply(A, function(x) {
  if (length(x) == 3) {
    paste(x[c(2, 3, 1)], collapse = "_")
  } else if (length(x) == 4) {
    paste(x[c(3, 4, 2, 1)], collapse = "_")
  } else {
    stop("unexpected length")
  }
})
# [1] "State_Jharkhand_Capital_Ranchi"   "State_Jharkhand_Bokaro"          
# [3] "State_Jharkhand_Tata Nagar"       "State_Jharkhand_Ramgarh"         
# [5] "State_Maharashtra_Pune"           "State_Maharashtra_Capital_Mumbai"
# [7] "State_Maharashtra_Nagpur"  

我不知道使用sapply是否违反了“不使用任何循环”的要求。