将字符串值的一部分转换为数字并连接

时间:2015-05-24 20:20:24

标签: r

我在下面有一些数据如下。如您所见,$ MarketCap列包含字符串和数字组件。我想用10 ^ 9替换所有B,用10 ^ 6替换M,当然与现有数值连接。

我试过:dframe$MarketCap <- replace(dframe$MarketCap, "B", 10^6)但我收到了错误消息

  

$&lt; - 。data.frame( tmp ,“MarketCap”,value = c(“$ 9B”,   “$ 987.15M”,:替换有908行,数据有907

              Symbol                                   Name  LastSale MarketCap IPOyear                Sector
904             DLR              Digital Realty Trust, Inc.     66.3       $9B    2004     Consumer Services
2745           SWAY     Starwood Waypoint Residential Trust    25.86  $987.15M    2014     Consumer Services
3140            WNC             Wabash National Corporation    14.45  $981.39M    1991         Capital Goods
2102            NOA    North American Energy Partners, Inc.     2.89   $98.24M    2006                Energy
3115             VG                   Vonage Holdings Corp.     4.57  $976.09M    2006      Public Utilities
273            ATTO                             Atento S.A.    13.21  $972.51M    2014      Public Utilities
2541            RMP              Rice Midstream Partners LP    16.79  $965.55M    2014      Public Utilities

在错误消息

之前输出数据帧的输出
data.frame':    907 obs. of  9 variables:
 $ Symbol       : Factor w/ 3285 levels "A","AA","AA^B",..: 844 2811 3170 2128 3127 245 2563 528 2171 2586 ...
 $ Name         : Factor w/ 2657 levels "3D Systems Corporation",..: 735 2214 2478 1689 2602 205 2048 635 1650 2055 ...
 $ LastSale     : Factor w/ 2572 levels "0.02","0.22",..: 2192 1153 412 758 1664 316 560 877 1872 1049 ...
 $ MarketCap    : chr  "$9B" "$987.15M" "$981.39M" "$98.24M" ...
 $ IPOyear      : Factor w/ 33 levels "1984","1985",..: 21 31 8 23 23 31 31 31 30 27 ...
 $ Sector       : Factor w/ 13 levels "Basic Industries",..: 5 5 2 6 11 11 11 2 6 13 ...
 $ industry     : Factor w/ 130 levels "Accident &Health Insurance",..: 109 109 32 89 123 123 83 19 86 87 ...
 $ Summary.Quote: Factor w/ 3285 levels "http://www.nasdaq.com/symbol/a",..: 844 2811 3170 2128 3127 245 2563 528 2171 2586 ...
 $ X            : logi  NA NA NA NA NA NA ...

1 个答案:

答案 0 :(得分:3)

You can try

library(gsubfn)
dframe$MarketCap <- as.numeric(gsubfn('[BMK$]', list(K='e3', M='e6', 
                            B='e9', "$"=''), dframe$MarketCap))

Or using base R

v1 <- sub('[$0-9.]+', '', dframe$MarketCap)
v2 <- c(K='e3', M='e6', B='e9')
dframe$MarketCap <- as.numeric(paste0(gsub('\\$|[A-Z]+', '', 
                     dframe$MarketCap), v2[v1]))
dframe$MarketCap
#[1] 9000000000  987150000  981390000   98240000  976090000  972510000  965550000