根据(非唯一)列值,用其他行中的值替换DataFrame行中的NaN值

时间:2020-11-02 16:00:08

标签: python pandas numpy dataframe

我有一个类似于以下内容的DataFrame,其中有一个具有非唯一值的列(在这种情况下为地址),还有一些包含有关其信息的列。

df = pd.DataFrame({'address': {0:'11 Star Street', 1:'22 Milky Way', 2:'88 Dark Drive', 3:'33 Planet Place', 4:'22 Milky Way', 5:'22 Milky Way'}, 'val': {0:10, 1:'', 2:'', 3:20, 4: 20, 5:''}, 'val2': {0:20, 1:'', 2:'', 3:40, 4:10, 5:''}})

           address val val2
0   11 Star Street  10   20
1     22 Milky Way         
2    88 Dark Drive         
3  33 Planet Place  20   40
4     22 Milky Way  20   10
5     22 Milky Way          

某些地址在DataFrame中出现多次,而某些重复的地址则缺少信息。如果某一行缺少值,但该地址出现在DataFrame的另一行中,我想用来自同一地址的NaN值替换NaN值,以得到如下结果:

           address val val2
0   11 Star Street  10   20
1     22 Milky Way  20   10
2    88 Dark Drive         
3  33 Planet Place  20   40
4     22 Milky Way  20   10
5     22 Milky Way  20   10

由于DataFrame包含数千个不同的地址,因此无法使用像字典这样的东西。

编辑:可以安全地假定两个值都缺失或都存在。换句话说,永远不会只有val而没有val2的行,反之亦然。但是,可以考虑这种可能情况的答案会更好!

1 个答案:

答案 0 :(得分:1)

多种方法,最简单的方法是groupby并填充/填充组。

Traceback (most recent call last):
  File "/home/ernesto/odoo12/odoo/http.py", line 656, in _handle_exception
    return super(JsonRequest, self)._handle_exception(exception)
  File "/home/ernesto/odoo12/odoo/http.py", line 314, in _handle_exception
    raise pycompat.reraise(type(exception), exception, sys.exc_info()[2])
  File "/home/ernesto/odoo12/odoo/tools/pycompat.py", line 87, in reraise
    raise value
  File "/home/ernesto/odoo12/odoo/http.py", line 698, in dispatch
    result = self._call_function(**self.params)
  File "/home/ernesto/odoo12/odoo/http.py", line 346, in _call_function
    return checked_call(self.db, *args, **kwargs)
  File "/home/ernesto/odoo12/odoo/service/model.py", line 98, in wrapper
    return f(dbname, *args, **kwargs)
  File "/home/ernesto/odoo12/odoo/http.py", line 339, in checked_call
    result = self.endpoint(*a, **kw)
  File "/home/ernesto/odoo12/odoo/http.py", line 941, in __call__
    return self.method(*args, **kw)
  File "/home/ernesto/odoo12/odoo/http.py", line 519, in response_wrap
    response = f(*args, **kw)
  File "/home/ernesto/odoo12/addons/web/controllers/main.py", line 966, in call_button
    action = self._call_kw(model, method, args, {})
  File "/home/ernesto/odoo12/addons/web/controllers/main.py", line 954, in _call_kw
    return call_kw(request.env[model], method, args, kwargs)
  File "/home/ernesto/odoo12/odoo/api.py", line 759, in call_kw
    return _call_kw_multi(method, model, args, kwargs)
  File "/home/ernesto/odoo12/odoo/api.py", line 746, in _call_kw_multi
    result = method(recs, *args, **kwargs)
  File "<decorator-gen-61>", line 2, in button_immediate_install
  File "/home/ernesto/odoo12/odoo/addons/base/models/ir_module.py", line 74, in check_and_log
    return method(self, *args, **kwargs)
  File "/home/ernesto/odoo12/odoo/addons/base/models/ir_module.py", line 445, in button_immediate_install
    return self._button_immediate_function(type(self).button_install)
  File "/home/ernesto/odoo12/odoo/addons/base/models/ir_module.py", line 561, in _button_immediate_function
    modules.registry.Registry.new(self._cr.dbname, update_module=True)
  File "/home/ernesto/odoo12/odoo/modules/registry.py", line 86, in new
    odoo.modules.load_modules(registry._db, force_demo, status, update_module)
  File "/home/ernesto/odoo12/odoo/modules/loading.py", line 421, in load_modules
    loaded_modules, update_module, models_to_check)
  File "/home/ernesto/odoo12/odoo/modules/loading.py", line 313, in load_marked_modules
    perform_checks=perform_checks, models_to_check=models_to_check
  File "/home/ernesto/odoo12/odoo/modules/loading.py", line 222, in load_module_graph
    load_data(cr, idref, mode, kind='data', package=package, report=report)
  File "/home/ernesto/odoo12/odoo/modules/loading.py", line 68, in load_data
    tools.convert_file(cr, package.name, filename, idref, mode, noupdate, kind, report)
  File "/home/ernesto/odoo12/odoo/tools/convert.py", line 798, in convert_file
    convert_csv_import(cr, module, pathname, fp.read(), idref, mode, noupdate)
  File "/home/ernesto/odoo12/odoo/tools/convert.py", line 841, in convert_csv_import
    result = env[model].load(fields, datas)
  File "/home/ernesto/odoo12/odoo/models.py", line 943, in load
    for id, xid, record, info in converted:
  File "/home/ernesto/odoo12/odoo/models.py", line 1068, in _convert_records
    for record, extras in stream:
  File "/home/ernesto/odoo12/odoo/tools/misc.py", line 859, in next
    val = next(self.stream, _ph)
  File "/home/ernesto/odoo12/odoo/models.py", line 991, in _extract_records
    for index, fnames in enumerate(fields_)
  File "/home/ernesto/odoo12/odoo/models.py", line 992, in <listcomp>
    if fields[fnames[0]].type == 'one2many'
KeyError: 'id  

另一种更有效的方法是沿轴使用import numpy as np import pandas as pd df = df.replace('',np.nan,regex=True).groupby('address').apply(lambda x : x.ffill().bfill()) print(df) address val val2 0 11 Star Street 10.0 20.0 1 22 Milky Way 20.0 10.0 2 88 Dark Drive NaN NaN 3 33 Planet Place 20.0 40.0 4 22 Milky Way 20.0 10.0 5 22 Milky Way 20.0 10.0

update