Pandas匹配多个列并将匹配值作为单个新列

时间:2018-02-21 19:51:54

标签: python pandas dataframe

我有一个大约有5列的数据框。我希望匹配的值可以出现在最后3列中的任何一列中。

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
3        fgh       20        22      23
4        lmn       20        22      21

我在最后三列中的任何一列上过滤值21,如下所示:

df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]

给了我

Key   |  col1   |  col2  |  col3 |  col4
----------------------------------------
1        abc       21        22      23
2        cde       22        21      20
4        lmn       20        22      21

使用这个新的df1我想得到这个

Key   |  col1   |  newCol
-------------------------
1        abc       21      
2        cde       21      
4        lmn       21      

基本上任何匹配的列都是新的列值。我如何使用熊猫这样做?我很感激帮助。所以我想可能是我应该同时过滤并将其映射到新列,但我不知道如何?

3 个答案:

答案 0 :(得分:3)

使用

In [722]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), 
                 ['Key', 'col1']].assign(newcol=21)
Out[722]:
   Key col1  newcol
0    1  abc      21
1    2  cde      21
3    4  lmn      21

详细

对必要的eq

进行平等检查['col2', 'col3', 'col4']
In [724]: df[['col2', 'col3', 'col4']].eq(21)
Out[724]:
    col2   col3   col4
0   True  False  False
1  False   True  False
2  False  False  False
3  False  False   True

any将返回行中是否有任何元素

In [725]: df[['col2', 'col3', 'col4']].eq(21).any(1)
Out[725]:
0     True
1     True
2    False
3     True
dtype: bool

使用.loc对匹配的行和必要的['Key', 'col1']列进行子集化。

In [726]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), ['Key', 'col1']]
Out[726]:
   Key col1
0    1  abc
1    2  cde
3    4  lmn

并且,.assign(newcol=21)会将newcol列设置为21

答案 1 :(得分:2)

这是一种方式。

class Rational {

    int _n = 0; // '_n' stands for numerator
    int _d = 1; // '_d' stands for denominator

public:

    Rational (int numerator = 0, int denominator = 1) : _n(numerator), _d(denominator) {};
    Rational (const Rational & rhs) : _n(rhs._n), _d(rhs._dd) {};

    ~Rational ();

    int numerator() const { retrun _n; };
    int denominator() const { return _d; };

    Rational & operator = (const Rational &);
    Rational operator + (const Rational &) const;
    Rational operator - (const Rational &) const;
    Rational operator * (const Rational &) const;
    Rational operator / (const Rational &) const;
};

Rational & Rational::operator = (const Rational & rhs) {
    if(this != &rhs){
        _n = rhs.numerator();
        _d = rhs.denominator();
    }
    return *this;
}

Rational Rational::operator + (const Rational & rhs) const {
    return Rational((_n * rhs._d) + (_d * rhs._n), (_d * rhs._d));
}

Rational Rational::operator - (const Rational & rhs) const {
    return Rational((_n * rhs._d) + (_d * rhs._n), (_d * rhs._d));
}

Rational Rational::operator * (const Rational & rhs) const {
    return Rational((_n * rhs._n), (_d * rhs._d));
}

Rational Rational::operator / (const Rational & rhs) const {
    return Rational((_n * rhs._d), (_d * rhs._n));
}

Rational::~Rational(){
    print("dtor: %d/%d\n", this->_n, this->_d);
    _n = 0; _d = 1;
}

std::ostream & operator << (std::ostream & o, const Rational & r){
    return o << r.numerator() << "/" << r.denominator();
}

int main(int argc, char** argv){

    Rational a = 7;                 // 7/1              
    cout << "a is: " << a << endl;
    Rational b(5, 3);               // 5/3
    cout << "b is: " << b << endl;
    Rational c = b;                 // Copy constructor
    cout << "c is: " << c << endl;
    Rational d;                     // Default constructor
    cout << "d is: " << d << endl;
    d = c;                          // Assignment constructor
    cout << "d is: " << d << endl;
    Rational & e = d;               // Reference
    d = e;                          // Assignment to self!
    cout << "e is: " << e << endl;

    cout << a << " + " << b << " = " << a + b << endl;
    cout << a << " - " << b << " = " << a - b << endl;
    cout << a << " * " << b << " = " << a * b << endl;
    cout << a << " / " << b << " = " << a / b << endl;

    return 0;
}

<强>解释

  • 将整数存储为整数而不是字符串。
  • import pandas as pd, numpy as np df = pd.DataFrame([[1, 'abc', 21, 22, 23], [2, 'cde', 22, 21, 20], [3, 'fgh', 20, 22, 23], [4, 'lmn', 20, 22, 21]], columns=['Key', 'col1', 'col2', 'col3', 'col4']) df2 = df[np.logical_or.reduce([df[col] == 21 for col in ['col2', 'col3', 'col4']])]\ .assign(newCol=21)\ .drop(['col2', 'col3', 'col4'], 1) # Key col1 newCol # 0 1 abc 21 # 1 2 cde 21 # 3 4 lmn 21 将您的np.logical_or.reduce条件应用于列表理解。
  • |使用过滤器值创建一个新列。
  • assign删除不需要的列,drop引用列。

答案 2 :(得分:0)

正如jpp所指出的,这里有两种可能:21和22在所有3列中都是通用的。假设您不知道您真正想要的是哪一个,那么您可以使用set()来隔离每列的唯一值,然后使用set.intersection()来查找共性:

df = pd.DataFrame([{'col1':'a', 'col2':21, 'col3':22, 'col4':23},
                   {'col1':'b', 'col2':22, 'col3':21, 'col4':20},
                   {'col1':'c', 'col2':20, 'col3':22, 'col4':21},
                   {'col1':'d', 'col2':21, 'col3':21, 'col4':22}])

s1 = set(df['col2'].values)
s2 = set(df['col3'].values)
s3 = set(df['col4'].values)

df['new_col'] = str(s1.intersection(s2, s3))
df

col1    col2    col3    col4    new_col
   a    21      22      23      {21, 22}
   b    22      21      20      {21, 22}
   c    20      22      21      {21, 22}
   d    21      21      22      {21, 22}