用不同列的标签替换整数 - 熊猫

时间:2017-05-15 18:36:04

标签: python pandas

我有一个包含多个整数列的熊猫数据框,以及{column:{integer:string_label}}的相应字典。

我正在尝试创建一个数据框,其中整数已被其标签替换。我得到的最接近的是,但输出有些出乎意料。

代码

import pandas as pd
import numpy as np


df = pd.DataFrame({'a':[1,2,3,3,8],'b':[8,8,8,8,7]})

dic = {'a':{1:"label1",2:"label2",3:"label3"}, 'b':{8:'label8',7:'label7'}}


converters = {column: lambda x: dic[column][x] if x in dic[column].keys() else np.nan for column in dic.keys()}

new = pd.DataFrame.from_dict({col: series.apply(converters[col]) 
                            if col in converters else series
                            for col, series in df.iteritems()})
print new

#Output:
#         a       b
# 0     NaN  label8
# 1     NaN  label8
# 2     NaN  label8
# 3     NaN  label8
# 4  label8  label7

2 个答案:

答案 0 :(得分:0)

问题是你在lambda函数中使用变量column,lambda声明不会存储值,它将使用变量在调用时保存的内容(在{中{1}}),它可以是任何东西。事实上,如果您运行代码有时会发现它会产生不同的结果。

可能的解决方案:

series.apply(converters[col])

答案 1 :(得分:0)

// File: settings.hpp
#include <string>
const std::string TERMINAL_STRING "Printing to the terminal";
const std::string FILE_STRING "Printing to a file";


// File: printer.hpp
#include <string>
#include <iostream>

class Printer
{
    private:
        const std::string welcomeMessage;
        static std::string initWelcomeMessage(std::ostream&);

    public:
        Printer(std::ostream&);
}

extern Printer::print;


// File: printer.cpp
#include "settings.hpp"

std::string Printer::initWelcomeMessage(std::ostream &outStream)
{
    if (&outStream == &std::cout)
    {
        return (TERMINAL_STRING);
    }
    else
    {
        return (FILE_STRING);
    }
}

Printer::Printer(std::ostream &outStream) :
    message(initWelcomeMessage(outStream)
{
    outStream << welcomeMessage << std::endl;

    return;
}


// File: main.cpp
#include "printer.hpp"

printer print(std::cout);

int main()
{
    return (0);
}

import numpy as np import pandas as pd df = pd.DataFrame({'a': [1, 2, 3, 3, 8], 'b': [8, 8, 8, 8, 7]}) dic = {'a': {1: "label1", 2: "label2", 3: "label3"}, 'b': {8: 'label8', 7: 'label7'}} df = df.replace(dic) allowed = {k: v.values() for k, v in dic.items()} for col_name, allowed_col_vals in allowed.items(): # Let's replace not allowed values by NaN df[col_name][~df[col_name].isin(allowed_col_vals)] = np.nan 最终会像这样结束:

df