熊猫:根据另一列的内容乘以一列

时间:2018-08-07 02:21:28

标签: python pandas numpy

我试图从此列中以浮动形式获取Market Cap

Company Info
Workhorse Group, Inc. (WKHS) Market Cap: $65.94M   
Xencor, Inc. (XNCR) Market Cap: $1.99B   
Zillow Group, Inc. (ZG) Market Cap: $10.28B   
Zillow Group, Inc. (Z) Market Cap: $10.17B   
Zogenix, Inc. (ZGNX) Market Cap: $1.99B

所需的输出

Market Cap
65940000.00
1990000000.00
10280000000.00
10170000000.00
1990000000.00

我可以用这个号码(可能是更好的方法)

df['market_cap'] = df['Company Info'].str.split('$').str.get(1).str[:-1]

market_cap
1.13B
283.56M
763.51M
231.31M
1.3B

但是我需要它作为浮点数,它是基于M列末尾的BCompany Info的乘数

multiplier = {'M': 1e6, 'B': 1e9}

2 个答案:

答案 0 :(得分:2)

一步一步

基本上像您一样提取market_cap,除了转换为float之外:

df['market_cap'] = df['Company Info'].str.split('$').str.get(1).str[:-1].astype(float)

使用正则表达式提取乘数:

df['multiplier'] = df['Company Info'].str.extract('\d+\.\d+(\w)')

将您的市值乘以您提供的映射:

df['Market Cap'] = df.market_cap.mul(df['multiplier'].map({'M': 1e6, 'B': 1e9}))

>>> df['Market Cap']
0    6.594000e+07
1    1.990000e+09
2    1.028000e+10
3    1.017000e+10
4    1.990000e+09
Name: Market Cap, dtype: float64

在一份声明中

这里与一根班轮相同:

df['Market Cap'] = (df['Company Info'].str.split('$')
                    .str.get(1).str[:-1]
                    .astype(float)
                    .mul(df['Company Info']
                         .str.extract('\d+\.\d+(\w)')
                         .map({'M': 1e6, 'B': 1e9})))

>>> df
                                       Company Info    Market Cap
0  Workhorse Group, Inc. (WKHS) Market Cap: $65.94M  6.594000e+07
1            Xencor, Inc. (XNCR) Market Cap: $1.99B  1.990000e+09
2       Zillow Group, Inc. (ZG) Market Cap: $10.28B  1.028000e+10
3        Zillow Group, Inc. (Z) Market Cap: $10.17B  1.017000e+10
4           Zogenix, Inc. (ZGNX) Market Cap: $1.99B  1.990000e+09

答案 1 :(得分:1)

使用 str.extract replace prod

(df['Company Info'].str.extract(r'\$([\d\.]+)([MB])')
    .replace({'M': 1e6, 'B': 1e9})
    .astype(float).prod(1)
)

0    6.594000e+07
1    1.990000e+09
2    1.028000e+10
3    1.017000e+10
4    1.990000e+09
Name: 1, dtype: float64
相关问题