我有一个大约有5列的数据框。我希望匹配的值可以出现在最后3列中的任何一列中。
Key | col1 | col2 | col3 | col4
----------------------------------------
1 abc 21 22 23
2 cde 22 21 20
3 fgh 20 22 23
4 lmn 20 22 21
我在最后三列中的任何一列上过滤值21
,如下所示:
df1 = df[(df['col2']=='21') | (df['col3']=='21') | (df['col4']=='21')]
给了我
Key | col1 | col2 | col3 | col4
----------------------------------------
1 abc 21 22 23
2 cde 22 21 20
4 lmn 20 22 21
使用这个新的df1我想得到这个
Key | col1 | newCol
-------------------------
1 abc 21
2 cde 21
4 lmn 21
基本上任何匹配的列都是新的列值。我如何使用熊猫这样做?我很感激帮助。所以我想可能是我应该同时过滤并将其映射到新列,但我不知道如何?
答案 0 :(得分:3)
使用
In [722]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1),
['Key', 'col1']].assign(newcol=21)
Out[722]:
Key col1 newcol
0 1 abc 21
1 2 cde 21
3 4 lmn 21
详细
对必要的eq
列
['col2', 'col3', 'col4']
In [724]: df[['col2', 'col3', 'col4']].eq(21)
Out[724]:
col2 col3 col4
0 True False False
1 False True False
2 False False False
3 False False True
any
将返回行中是否有任何元素
In [725]: df[['col2', 'col3', 'col4']].eq(21).any(1)
Out[725]:
0 True
1 True
2 False
3 True
dtype: bool
使用.loc
对匹配的行和必要的['Key', 'col1']
列进行子集化。
In [726]: df.loc[df[['col2', 'col3', 'col4']].eq(21).any(1), ['Key', 'col1']]
Out[726]:
Key col1
0 1 abc
1 2 cde
3 4 lmn
并且,.assign(newcol=21)
会将newcol
列设置为21
答案 1 :(得分:2)
这是一种方式。
class Rational {
int _n = 0; // '_n' stands for numerator
int _d = 1; // '_d' stands for denominator
public:
Rational (int numerator = 0, int denominator = 1) : _n(numerator), _d(denominator) {};
Rational (const Rational & rhs) : _n(rhs._n), _d(rhs._dd) {};
~Rational ();
int numerator() const { retrun _n; };
int denominator() const { return _d; };
Rational & operator = (const Rational &);
Rational operator + (const Rational &) const;
Rational operator - (const Rational &) const;
Rational operator * (const Rational &) const;
Rational operator / (const Rational &) const;
};
Rational & Rational::operator = (const Rational & rhs) {
if(this != &rhs){
_n = rhs.numerator();
_d = rhs.denominator();
}
return *this;
}
Rational Rational::operator + (const Rational & rhs) const {
return Rational((_n * rhs._d) + (_d * rhs._n), (_d * rhs._d));
}
Rational Rational::operator - (const Rational & rhs) const {
return Rational((_n * rhs._d) + (_d * rhs._n), (_d * rhs._d));
}
Rational Rational::operator * (const Rational & rhs) const {
return Rational((_n * rhs._n), (_d * rhs._d));
}
Rational Rational::operator / (const Rational & rhs) const {
return Rational((_n * rhs._d), (_d * rhs._n));
}
Rational::~Rational(){
print("dtor: %d/%d\n", this->_n, this->_d);
_n = 0; _d = 1;
}
std::ostream & operator << (std::ostream & o, const Rational & r){
return o << r.numerator() << "/" << r.denominator();
}
int main(int argc, char** argv){
Rational a = 7; // 7/1
cout << "a is: " << a << endl;
Rational b(5, 3); // 5/3
cout << "b is: " << b << endl;
Rational c = b; // Copy constructor
cout << "c is: " << c << endl;
Rational d; // Default constructor
cout << "d is: " << d << endl;
d = c; // Assignment constructor
cout << "d is: " << d << endl;
Rational & e = d; // Reference
d = e; // Assignment to self!
cout << "e is: " << e << endl;
cout << a << " + " << b << " = " << a + b << endl;
cout << a << " - " << b << " = " << a - b << endl;
cout << a << " * " << b << " = " << a * b << endl;
cout << a << " / " << b << " = " << a / b << endl;
return 0;
}
<强>解释强>
import pandas as pd, numpy as np
df = pd.DataFrame([[1, 'abc', 21, 22, 23],
[2, 'cde', 22, 21, 20],
[3, 'fgh', 20, 22, 23],
[4, 'lmn', 20, 22, 21]],
columns=['Key', 'col1', 'col2', 'col3', 'col4'])
df2 = df[np.logical_or.reduce([df[col] == 21 for col in ['col2', 'col3', 'col4']])]\
.assign(newCol=21)\
.drop(['col2', 'col3', 'col4'], 1)
# Key col1 newCol
# 0 1 abc 21
# 1 2 cde 21
# 3 4 lmn 21
将您的np.logical_or.reduce
条件应用于列表理解。|
使用过滤器值创建一个新列。assign
删除不需要的列,drop
引用列。答案 2 :(得分:0)
正如jpp所指出的,这里有两种可能:21和22在所有3列中都是通用的。假设您不知道您真正想要的是哪一个,那么您可以使用set()
来隔离每列的唯一值,然后使用set.intersection()
来查找共性:
df = pd.DataFrame([{'col1':'a', 'col2':21, 'col3':22, 'col4':23},
{'col1':'b', 'col2':22, 'col3':21, 'col4':20},
{'col1':'c', 'col2':20, 'col3':22, 'col4':21},
{'col1':'d', 'col2':21, 'col3':21, 'col4':22}])
s1 = set(df['col2'].values)
s2 = set(df['col3'].values)
s3 = set(df['col4'].values)
df['new_col'] = str(s1.intersection(s2, s3))
df
col1 col2 col3 col4 new_col
a 21 22 23 {21, 22}
b 22 21 20 {21, 22}
c 20 22 21 {21, 22}
d 21 21 22 {21, 22}