将数百万和十亿转换为Python列中的数字或整数

时间:2014-11-06 16:22:16

标签: python pandas

我有一个像'1000万'和'50亿'这样的值的列,想要一个简单的方法将它转换成数值来做进一步的分析。 我试过了

powers = {'billion': 10 ** 9, 'million': 10 ** 6}

def f(s):
    try:
        power = s[-1]
        return float(s[:-1]) * powers[power]
    except TypeError:
        return s

df_2.applymap(f)

更新:我的pandas Column包含0(即NaN)和其他包含数百万和数十亿的值。 我希望这比以前更清楚 我使用了@MobiusKlein推荐的方法。 所以这是有用的堆栈跟踪错误。

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-1db4b2353170> in <module>()
     10       return float(quantity) * powers[magnitude]
     11 
---> 12 df_2.applymap(f)
     13 

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in applymap(self, func)
   3725                 x = lib.map_infer(_values_from_object(x), f)
   3726             return lib.map_infer(_values_from_object(x), func)
-> 3727         return self.apply(infer)
   3728 
   3729     #----------------------------------------------------------------------

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3556                     if reduce is None:
   3557                         reduce = True
-> 3558                     return self._apply_standard(f, axis, reduce=reduce)
   3559             else:
   3560                 return self._apply_broadcast(f, axis)

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   3646             try:
   3647                 for i, v in enumerate(series_gen):
-> 3648                     results[i] = func(v)
   3649                     keys.append(v.name)
   3650             except Exception as e:

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in infer(x)
   3724                 f = com.i8_boxer(x)
   3725                 x = lib.map_infer(_values_from_object(x), f)
-> 3726             return lib.map_infer(_values_from_object(x), func)
   3727         return self.apply(infer)
   3728 

/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.map_infer (pandas/lib.c:56671)()

<ipython-input-12-1db4b2353170> in f(num_str)
      4 
      5 def f(num_str):
----> 6    match = re.search(r"([0-9\.]+)\s?(million|billion)", num_str)
      7    if match is not None:
      8       quantity = match.group(0)

/home/peadarcoyle/.virtualenvs/Ipython/lib/python2.7/re.pyc in search(pattern, string, flags)
    140     """Scan through string looking for a match to the pattern, returning
    141     a match object, or None if no match was found."""
--> 142     return _compile(pattern, flags).search(string)
    143 
    144 def sub(pattern, repl, string, count=0, flags=0):

TypeError: ('expected string or buffer', u'occurred at index Intended_Investment')

1 个答案:

答案 0 :(得分:1)

查询字符串中数字的功能不考虑空格或多个数字前导。尝试一些更复杂的东西:

import re
powers = {'billion': 10 ** 9, 'million': 10 ** 6}

def f(num_str):
   match = re.search(r"([0-9\.]+)\s?(million|billion)", num_str)
   if match is not None:
      quantity = match.group(0)
      magnitude = match.group(1)
      return float(quantity) * powers[magnitude]

如果无法从字符串中提取正确的标记,则会抛出错误,但它会处理空白区域和不规则的幂。如果您担心浮点错误,请使用int()将数量转换为数字类型,但请确保您不处理小数。如果你是这样,你可以玩大小的游戏来解决这个问题,但它会使代码比第一次传递更复杂。