根据其他数据框中的值选择数据框行

时间:2018-02-07 10:05:43

标签: python-2.7 pandas dataframe

我有两个数据帧:

DF1:

name
abc
lmn
pqr

DF2:

m_name  n_name  loc
abc     tyu     IND
bcd     abc     RSA
efg     poi     SL      
lmn     ert     AUS
nne     bnm     ENG
pqr     lmn     NZ
xyz     asd     BAN

我想在以下条件下生成新的数据框:

  1. 如果df2.m_name == df1.name或df2.n_name == df1.name

  2. 消除重复行

  3. 以下是期望的输出:

    m_name  n_name  loc
    abc     tyu     IND
    bcd     abc     RSA
    lmn     ert     AUS
    pqr     lmn     NZ
    

    我可以获得有关如何实现这一目标的任何建议吗?

2 个答案:

答案 0 :(得分:2)

使用

BindingSeq

或使用查询

import com.thoughtworks.binding.Binding
import com.thoughtworks.binding.Binding.{BindingSeq, Var, Vars}

val x1: Var[Int] = Var{1}
val x2: Binding[Int] = Binding{1}
val x3: Vars[Int] = Vars{Seq(1,2,3): _*}
val x4: BindingSeq[Int] = BindingSeq{Seq(1,2,3): _*}
ScalaFiddle.scala:18: error: .this.com.thoughtworks.binding.Binding.BindingSeq.type does not take parameters
  val x4: BindingSeq[Int] = BindingSeq{Seq(1,2,3): _*}

答案 1 :(得分:2)

使用:

print (df2)
  m_name n_name  loc
0    abc    tyu  IND
1    abc    tyu  IND
2    bcd    abc  RSA
3    efg    poi   SL
4    lmn    ert  AUS
5    nne    bnm  ENG
6    pqr    lmn   NZ
7    xyz    asd  BAN

df3 = df2.filter(like='name')
#another solution is filter columns by columns names in list
#df3 = df2[['m_name','n_name']]
df = df2[df3.isin(df1['name'].tolist()).any(axis=1)]
df = df.drop_duplicates(df3.columns)
print (df)
  m_name n_name  loc
0    abc    tyu  IND
2    bcd    abc  RSA
4    lmn    ert  AUS
6    pqr    lmn   NZ

<强>详情:

使用filter name print (df2.filter(like='name')) m_name n_name 0 abc tyu 1 abc tyu 2 bcd abc 3 efg poi 4 lmn ert 5 nne bnm 6 pqr lmn 7 xyz asd 找到所有列。

print (df2.filter(like='name').isin(df1['name'].tolist()))
   m_name  n_name
0    True   False
1    True   False
2   False    True
3   False   False
4    True   False
5   False   False
6    True    True
7   False   False

DataFrame.isin比较:

True

any每行至少获得一个print (df2.filter(like='name').isin(df1['name'].tolist()).any(axis=1)) 0 True 1 True 2 True 3 False 4 True 5 False 6 True 7 False dtype: bool

df = df2[df2.filter(like='name').isin(df1['name'].tolist()).any(axis=1)]
print (df)
  m_name n_name  loc
0    abc    tyu  IND
1    abc    tyu  IND
2    bcd    abc  RSA
4    lmn    ert  AUS
6    pqr    lmn   NZ

boolean indexing过滤:

name

最后删除重复项drop_duplicates(如果需要删除所有subset列的dupes,请添加df = df.drop_duplicates(subset=df3.columns) print (df) m_name n_name loc 0 abc tyu IND 2 bcd abc RSA 4 lmn ert AUS 6 pqr lmn NZ 参数)

$string = "Loreim ipsum lorem ipsum @Leader_abcXyz! loreim ipsum loreim ipsum @Leader_xyzAbc! loreim ipsum lorem ipsul @Leader_jklMno oremipsuim!";
$pattern = "@Leader_";
$result = someRegularExpression($string,$pattern);