我有以下数据:
Opex_Spend_Month Opex_Spend_YTD Major_Category NBS_Region Sub_Category
92179.84 113542.84 Contingent Labour EUROPE TEMP:OTH.CONT.WORKER
297.82 82392.82 Contingent Labour EUROPE TEMP:OTH.CONT.WORKER
13974.8 34917.8 Contingent Labour EUROPE TEMP:OTH.CONT.WORKER
138.6 63125.6 Contingent Labour EUROPE TEMP:OTH.CONT.WORKER
NA 73097 Contingent Labour EUROPE TEMP:MSP NON IT
NA 96035 Contingent Labour EUROPE TEMP:MSP NON IT
1388.65 68934.65 Contingent Labour EUROPE TEMP:MSP NON IT
5393.76 18748.76 Contingent Labour EUROPE TEMP:MSP IT
528.38 82195.38 Contingent Labour EUROPE TEMP:MSP IT
22369 95468 Contingent Labour EUROPE TEMP:MSP IT
从Sub_Category列我希望能够选择Cont Worker,Non IT&的最后部分。我和我不确定要使用什么样的正则表达式或子字符串函数。
所需输出
Opex_Spend_Month Opex_Spend_YTD Major_Category NBS_Region Sub_Category Category
92179.84 113542.84 Contingent Labour EUROPE TEMP:OTH.CONT.WORKER Cont Worker
297.82 82392.82 Contingent Labour EUROPE TEMP:OTH.CONT.WORKER Cont Worker
13974.8 34917.8 Contingent Labour EUROPE TEMP:OTH.CONT.WORKER Cont Worker
138.6 63125.6 Contingent Labour EUROPE TEMP:OTH.CONT.WORKER Cont Worker
NA 73097 Contingent Labour EUROPE TEMP:MSP NON IT Non IT
NA 96035 Contingent Labour EUROPE TEMP:MSP NON IT Non IT
1388.65 68934.65 Contingent Labour EUROPE TEMP:MSP NON IT Non IT
5393.76 18748.76 Contingent Labour EUROPE TEMP:MSP IT IT
528.38 82195.38 Contingent Labour EUROPE TEMP:MSP IT IT
22369 95468 Contingent Labour EUROPE TEMP:MSP IT IT
有人可以帮我解决这个问题吗?
答案 0 :(得分:1)
我们可以使用library(stringr)
str_extract(df1$Sub_Category, "(CONT\\.WORKER|NON IT|IT)$")
{{1}}
答案 1 :(得分:1)
You can do:
gsub(".*?(\\.|\\s)(\\w+)","\\2 ",dat$Sub_Category)
这是一个例子:只需调用最后两列(5:6),看看会发生什么:
transform(dat,category=gsub(".*?(\\.|\\s)(\\w+)","\\2 ",Sub_Category))[5:6]
Sub_Category category
1 TEMP:OTH.CONT.WORKER CONT WORKER
2 TEMP:OTH.CONT.WORKER CONT WORKER
3 TEMP:OTH.CONT.WORKER CONT WORKER
4 TEMP:OTH.CONT.WORKER CONT WORKER
5 TEMP:MSP NON IT NON IT
6 TEMP:MSP NON IT NON IT
7 TEMP:MSP NON IT NON IT
8 TEMP:MSP IT IT
9 TEMP:MSP IT IT
10 TEMP:MSP IT IT
答案 2 :(得分:0)
在基地R:
df$Category = trimws(gsub('([A-Z]+:[A-Z]+|\\.)', ' ', df$Sub_Category))