我在每个地理区域或理事会都有社会关怀支出的理事会数据,如下所示:
Council Expenditure
Cumbria (102) 100
South Tyneside (109) 200
Bexley (718) 150
Nottingham (512) 178
正如您在数据框的Council
列中所看到的,您可以在括号中给出议会名称及其各自的代码,即(102),(109)等。
但我想将议会名称及其各自的代码分成两个不同的列,并删除议会代码周围的括号,看起来更像是这样:
Council Council Code Expenditure
Cumbria 102 100
South Tyneside 109 200
Bexley 718 150
Nottingham 178 178
我已经在Stackoverflow上查看了这些类型问题的其他类似帖子,并使用了strsplit()
,gsub()
等字符串操作数组,但无济于事。我特别难以使用括号。
您能否建议我如何在 R 中执行此操作?
答案 0 :(得分:2)
这是使用grouping
regular expression
完成任务的一种方式:
Council <- read.table(
text = "Council,Expenditure
Cumbria (102),100
South Tyneside (109),200
Bexley (718),150
Nottingham (512),78",
header = T,
sep = ",",
stringsAsFactors = F
)
Council <- transform(Council,
# Get the Coucil_Code column
Council_Code = as.numeric(gsub("([^\\d]+)(\\d+)(\\))","\\2",
Council,
perl = T)),
# Clean up the Council column
Council = trimws(gsub("([a-zA-z\\s]+)([\\d\\(\\)]+)","\\1",
Council,
perl = T))
)
Council Expenditure Council_Code
Cumbria 100 102
South Tyneside 200 109
Bexley 150 718
Nottingham 78 512
我希望这会有所帮助。
答案 1 :(得分:1)
使用gsub
:
res <- setNames(data.frame(trimws(gsub("[[:digit:]\\()]","",df$Council))
, df$Expenditure, gsub("[^[:digit:]]","",df$Council)),
c("Council","Expenditure","Council Code"))
# Council Expenditure Council Code
#1 Cumbria 100 102
#2 South Tyneside 200 109
#3 Bexley 150 718
#4 Nottingham 78 512
[[:digit:]\\()]
:仅提取姓名[^[:digit:]]
:提取数字答案 2 :(得分:1)
tidyr
选项为extract
library(tidyr)
extract(df1, Council, into = c("Council", "CouncilCode"), "([^(]+)\\s+\\(([0-9]+).")
# Council CouncilCode Expenditure
#1 Cumbria 102 100
#2 South Tyneside 109 200
#3 Bexley 718 150
#4 Nottingham 512 78
答案 3 :(得分:1)
library(reshape2)
colsplit(string = gsub(pattern = "\\(|\\)",replacement = "",x = Council$Council),
pattern = " ",names = c("Council","Council_code"))
结果:
Council Council_code
1. Cumbria 102
2. South Tyneside 109
3. Bexley 718
4. Nottingham 512