我有一个像'1000万'和'50亿'这样的值的列,想要一个简单的方法将它转换成数值来做进一步的分析。 我试过了
powers = {'billion': 10 ** 9, 'million': 10 ** 6}
def f(s):
try:
power = s[-1]
return float(s[:-1]) * powers[power]
except TypeError:
return s
df_2.applymap(f)
更新:我的pandas Column包含0(即NaN)和其他包含数百万和数十亿的值。 我希望这比以前更清楚 我使用了@MobiusKlein推荐的方法。 所以这是有用的堆栈跟踪错误。
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-12-1db4b2353170> in <module>()
10 return float(quantity) * powers[magnitude]
11
---> 12 df_2.applymap(f)
13
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in applymap(self, func)
3725 x = lib.map_infer(_values_from_object(x), f)
3726 return lib.map_infer(_values_from_object(x), func)
-> 3727 return self.apply(infer)
3728
3729 #----------------------------------------------------------------------
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
3556 if reduce is None:
3557 reduce = True
-> 3558 return self._apply_standard(f, axis, reduce=reduce)
3559 else:
3560 return self._apply_broadcast(f, axis)
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
3646 try:
3647 for i, v in enumerate(series_gen):
-> 3648 results[i] = func(v)
3649 keys.append(v.name)
3650 except Exception as e:
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/core/frame.pyc in infer(x)
3724 f = com.i8_boxer(x)
3725 x = lib.map_infer(_values_from_object(x), f)
-> 3726 return lib.map_infer(_values_from_object(x), func)
3727 return self.apply(infer)
3728
/home/peadarcoyle/.virtualenvs/Ipython/local/lib/python2.7/site-packages/pandas/lib.so in pandas.lib.map_infer (pandas/lib.c:56671)()
<ipython-input-12-1db4b2353170> in f(num_str)
4
5 def f(num_str):
----> 6 match = re.search(r"([0-9\.]+)\s?(million|billion)", num_str)
7 if match is not None:
8 quantity = match.group(0)
/home/peadarcoyle/.virtualenvs/Ipython/lib/python2.7/re.pyc in search(pattern, string, flags)
140 """Scan through string looking for a match to the pattern, returning
141 a match object, or None if no match was found."""
--> 142 return _compile(pattern, flags).search(string)
143
144 def sub(pattern, repl, string, count=0, flags=0):
TypeError: ('expected string or buffer', u'occurred at index Intended_Investment')
答案 0 :(得分:1)
查询字符串中数字的功能不考虑空格或多个数字前导。尝试一些更复杂的东西:
import re
powers = {'billion': 10 ** 9, 'million': 10 ** 6}
def f(num_str):
match = re.search(r"([0-9\.]+)\s?(million|billion)", num_str)
if match is not None:
quantity = match.group(0)
magnitude = match.group(1)
return float(quantity) * powers[magnitude]
如果无法从字符串中提取正确的标记,则会抛出错误,但它会处理空白区域和不规则的幂。如果您担心浮点错误,请使用int()将数量转换为数字类型,但请确保您不处理小数。如果你是这样,你可以玩大小的游戏来解决这个问题,但它会使代码比第一次传递更复杂。