使用gsub(或类似方法)从向量中提取并保留列名的最后4位数字

时间:2018-07-20 14:22:51

标签: r gsub

我正在尝试使用gsubsubstr或类似的东西来保留由symbolsdate组成的列名。 symbols.f存储在向量中(可以使用不同的代码符号);

symbols.f <- c("NVDA.f", "GOOG.f", "GE.f")

然后我从下面的colnames()中得到以下dput()

[1] "GE.f.12.31.2017"  
     [2] "GE.f.12.31.2016"  
     [3] "GE.f.12.31.2015"  
     [4] "GE.f.12.31.2014"  
     [5] "GOOG.f.12.31.2017"
     [6] "GOOG.f.12.31.2016"
     [7] "GOOG.f.12.31.2015"
     [8] "GOOG.f.12.31.2014"
     [9] "NVDA.f.1.28.2018" 
    [10] "NVDA.f.1.29.2017" 
    [11] "NVDA.f.1.31.2016" 
    [12] "NVDA.f.1.25.2015" 

我要做的是保留ticker并保留year或列名的后4位。例如前两个股票;

     [1] "GE2017"  
     [2] "GE2016"  
     [3] "GE2015"  
     [4] "GE2014"  
     [5] "GOOG2017"
     [6] "GOOG2016"
     [7] "GOOG2015"
     [8] "GOOG2014"

我能够提取最后4位数字或所有字符,但似乎无法一起或一次完成。

数据:

df <- structure(list(GE.f.12.31.2017 = c(18211000, NA, 46549000, 21923000, 
5790000, 140110000, 38696000, 53874000, 83968000, 20273000, NA, 
41024000, 6207000, 377945000, 15153000, 134591000, 21400000, 
61893000, 108575000, 82597000, NA, 21122000, NA, 292560000, NA, 
NA, NA, 702000, 125682000, -62127000, NA, 22775000, 64257000, 
-39984000), GE.f.12.31.2016 = c(10525000, NA, 42687000, 22354000, 
2867000, 149029000, 44313000, 50518000, 68070000, 16436000, NA, 
34449000, 1833000, 365183000, 14435000, 136211000, 20772000, 
70364000, 105080000, 83040000, NA, 4688000, NA, 284667000, NA, 
NA, NA, 702000, 139532000, -64412000, NA, 18626000, 75822000, 
-11052000), GE.f.12.31.2015 = c(10372000, NA, 43013000, 22515000, 
5109000, 280896000, 31973000, 54095000, 65526000, 17797000, NA, 
42784000, 3105000, 493071000, 13680000, 197602000, 27453000, 
138270000, 144659000, 79175000, NA, 4836000, NA, 389961000, NA, 
NA, NA, 702000, 140020000, -42454000, NA, 21085000, 98268000, 
14945000), GE.f.12.31.2014 = c(15916000, NA, 23237000, 17639000, 
6566000, 460743000, 35505000, 48070000, 53207000, 13182000, NA, 
44247000, 6183000, 654954000, 12067000, 261424000, 18203000, 
229564000, 186596000, 70801000, NA, 8772000, NA, 518023000, NA, 
NA, NA, 702000, 155333000, -27876000, NA, 14717000, 128159000, 
61770000), GOOG.f.12.31.2017 = c(10715000, 91156000, 18705000, 
749000, 2983000, 124308000, 7813000, 42383000, 16747000, 2692000, 
NA, 3352000, 680000, 197295000, 3137000, 3969000, 10651000, 24183000, 
3943000, 16641000, NA, NA, NA, 44793000, NA, NA, NA, 40247000, 
113247000, -992000, NA, -992000, 152502000, 133063000), GOOG.f.12.31.2016 = c(12918000, 
73415000, 15632000, 268000, 3175000, 105408000, 5878000, 34234000, 
16468000, 3307000, NA, 2202000, 383000, 167497000, 2041000, 3935000, 
5851000, 16756000, 3935000, 7770000, NA, NA, NA, 28461000, NA, 
NA, NA, 36307000, 105131000, -2402000, NA, -2402000, 139036000, 
119261000), GOOG.f.12.31.2015 = c(15409000, 56517000, 13459000, 
491000, 1590000, 90114000, 5183000, 29016000, 15869000, 3847000, 
NA, 3432000, 251000, 147461000, 1931000, 7648000, 4327000, 19310000, 
1995000, 5825000, NA, NA, NA, 27130000, NA, NA, NA, 32982000, 
89223000, -1874000, NA, -1874000, 120331000, 100615000), GOOG.f.12.31.2014 = c(16585000, 
46048000, 9974000, NA, 2637000, 78656000, 3079000, 23883000, 
15599000, 4607000, NA, 3363000, 176000, 129187000, 1715000, 8015000, 
2803000, 16779000, 2992000, 5320000, NA, NA, NA, 25327000, NA, 
NA, NA, 28767000, 75066000, 27000, NA, 27000, 103860000, 83654000
), NVDA.f.1.28.2018 = c(7108000, NA, 1265000, 796000, NA, 9255000, 
NA, 997000, 618000, 52000, NA, 319000, NA, 11241000, 596000, 
2e+06, NA, 1153000, 1985000, 632000, NA, NA, NA, 3770000, NA, 
NA, NA, 7471000, NA, NA, NA, NA, 7471000, 6801000), NVDA.f.1.29.2017 = c(1766000, 
5032000, 826000, 794000, NA, 8536000, NA, 521000, 618000, 104000, 
NA, 62000, NA, 9841000, 485000, 2791000, 325000, 1788000, 1985000, 
3e+05, NA, NA, NA, 4079000, NA, NA, NA, 1000, 6108000, -5055000, 
4708000, -16000, 5762000, 5040000), NVDA.f.1.31.2016 = c(596000, 
4441000, 505000, 418000, NA, 6053000, NA, 466000, 618000, 166000, 
NA, 67000, NA, 7370000, 296000, 1434000, 532000, 2351000, 7000, 
533000, NA, NA, NA, 2901000, NA, NA, NA, 1000, 4350000, -4052000, 
4170000, -4000, 4469000, 3685000), NVDA.f.1.25.2015 = c(497000, 
4126000, 474000, 483000, 63000, 5713000, NA, 557000, 618000, 
222000, NA, 91000, NA, 7201000, 293000, 1398000, 471000, 896000, 
1384000, 489000, NA, NA, NA, 2783000, NA, NA, NA, 1000, 3949000, 
-3387000, 3855000, 8000, 4418000, 3578000)), .Names = c("GE.f.12.31.2017", 
"GE.f.12.31.2016", "GE.f.12.31.2015", "GE.f.12.31.2014", "GOOG.f.12.31.2017", 
"GOOG.f.12.31.2016", "GOOG.f.12.31.2015", "GOOG.f.12.31.2014", 
"NVDA.f.1.28.2018", "NVDA.f.1.29.2017", "NVDA.f.1.31.2016", "NVDA.f.1.25.2015"
), row.names = c("Cash And Cash Equivalents", "Short Term Investments", 
"Net Receivables", "Inventory", "Other Current Assets", "Total Current Assets", 
"Long Term Investments", "Property Plant and Equipment", "Goodwill", 
"Intangible Assets", "Accumulated Amortization", "Other Assets", 
"Deferred Long Term Asset Charges", "Total Assets", "Accounts Payable", 
"Short/Current Long Term Debt", "Other Current Liabilities", 
"Total Current Liabilities", "Long Term Debt", "Other Liabilities", 
"Deferred Long Term Liability Charges", "Minority Interest", 
"Negative Goodwill", "Total Liabilities", "Misc. Stocks Options Warrants", 
"Redeemable Preferred Stock", "Preferred Stock", "Common Stock", 
"Retained Earnings", "Treasury Stock", "Capital Surplus", "Other Stockholder Equity", 
"Total Stockholder Equity", "Net Tangible Assets"), class = "data.frame")

2 个答案:

答案 0 :(得分:4)

此正则表达式会起作用吗?

gsub("\\..*\\.", "", colnames(df))

它删除第一个和最后一个'。以及两者之间的所有内容。

#[1] "GE2017"   "GE2016"   "GE2015"   "GE2014"   "GOOG2017"
#[6] "GOOG2016" "GOOG2015" "GOOG2014" "NVDA2018" "NVDA2017"
#[11] "NVDA2016" "NVDA2015"

# '\\.' = match a dot, '.' = match anything, '*' = match the previous 0 or more times 
# so \\..*\\. means "anything 0 or more times, preceded by a dot, followed by a dot")  
# the \\ are escapes so the regex can differentiate whether you mean the 
# expression '.' (anything) or '\\.' (actual dot)

答案 1 :(得分:1)

这是@Ape使用sub和捕获组的答案的替代方法:

sub("^([^.]+).*?(\\d+)$", "\\1\\2", colnames(df))

Demo