如何通过删除R中的括号将字符列拆分为两列?

时间:2016-10-15 19:59:21

标签: r string dataframe split

我在每个地理区域或理事会都有社会关怀支出的理事会数据,如下所示:

Council                     Expenditure
Cumbria (102)               100
South Tyneside (109)        200
Bexley (718)                150
Nottingham (512)            178

正如您在数据框的Council列中所看到的,您可以在括号中给出议会名称及其各自的代码,即(102),(109)等。

但我想将议会名称及其各自的代码分成两个不同的列,并删除议会代码周围的括号,看起来更像是这样:

Council          Council Code                 Expenditure
Cumbria          102                          100
South Tyneside   109                          200
Bexley           718                          150
Nottingham       178                          178

我已经在Stackoverflow上查看了这些类型问题的其他类似帖子,并使用了strsplit()gsub()等字符串操作数组,但无济于事。我特别难以使用括号。

您能否建议我如何在 R 中执行此操作?

4 个答案:

答案 0 :(得分:2)

这是使用grouping regular expression完成任务的一种方式:

数据:

Council <- read.table(
  text = "Council,Expenditure
Cumbria (102),100
South Tyneside (109),200
Bexley (718),150
Nottingham (512),78",
  header = T,
  sep = ",",
  stringsAsFactors = F
)

代码:

Council <- transform(Council,
       # Get the Coucil_Code column
       Council_Code = as.numeric(gsub("([^\\d]+)(\\d+)(\\))","\\2",
                                               Council, 
                                               perl = T)),
       # Clean up the Council column
       Council = trimws(gsub("([a-zA-z\\s]+)([\\d\\(\\)]+)","\\1",
                                      Council, 
                                      perl = T))
)

输出:

 Council        Expenditure Council_Code
 Cumbria        100         102         
 South Tyneside 200         109         
 Bexley         150         718         
 Nottingham      78         512 

我希望这会有所帮助。

答案 1 :(得分:1)

使用gsub

res <- setNames(data.frame(trimws(gsub("[[:digit:]\\()]","",df$Council))
                    , df$Expenditure, gsub("[^[:digit:]]","",df$Council)),
                c("Council","Expenditure","Council Code"))

#         Council Expenditure Council Code
#1        Cumbria         100          102
#2 South Tyneside         200          109
#3         Bexley         150          718
#4     Nottingham          78          512
  • [[:digit:]\\()]:仅提取姓名
  • [^[:digit:]]:提取数字

答案 2 :(得分:1)

tidyr选项为extract

library(tidyr)
extract(df1, Council, into = c("Council", "CouncilCode"), "([^(]+)\\s+\\(([0-9]+).")
#         Council CouncilCode Expenditure
#1        Cumbria         102         100
#2 South Tyneside         109         200
#3         Bexley         718         150
#4     Nottingham         512          78

答案 3 :(得分:1)

library(reshape2)
colsplit(string = gsub(pattern = "\\(|\\)",replacement = "",x = Council$Council),
     pattern = " ",names = c("Council","Council_code"))

结果:

    Council Council_code
1. Cumbria          102
2. South Tyneside   109
3. Bexley           718
4. Nottingham       512