Question

我在pandas中有一个非常大的数据帧，其中一列被标记为＆＃34; Col2＆＃34;并且此列的行值包含一个长字符串。我从这个数据帧中解析出另一个较小的数据帧，其中包含列＆＃34; Col2＆＃34;我希望从原版中删除的值。基本上，我想迭代原始数据帧并根据与子集数据帧匹配的Col2值删除整行;根据Col2值，从另一个数据帧中逐个减去一个数据帧。我怎样才能做到这一点？

Answer 1

这是你想要的吗

from pandas import DataFrame

d2 = DataFrame([[5,6],[7,8],[3,4]],columns=["a","b"])

   a  b
0  5  6
1  7  8
2  3  4

d1=DataFrame([[1,2],[3,4]],columns=["a","b"])

   a  b
0  1  2
1  3  4


ind = d2.a.isin(d1.a).tolist()
ind=map(lambda x : x[0],filter(lambda x : x[1]==True,list(enumerate(ind))))

d2.drop(ind)
   a  b
0  5  6
1  7  8

Answer 2

希望这会有所帮助。让我知道你的看法。

import pandas as pd
import numpy as np

df=pd.DataFrame({'col1':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],'col2':['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O']})
df2= pd.DataFrame({'col1':[1,5,11],'col2':['A','D','K']})

#for each value in col2 of df2 search whole of df1 col2
for x in df2['col2']:
    for y in df.iterrows():

        #if they are equal then drop the row from df and reasign it to df
        if y[1]['col2'] == x:
            df=df.drop(y[0])
print(df)

Answer 3

df1.query(-df1.Col2.isin(df2.Col2))

以下是我的代码输出：

import pandas as pd  

df1 =pd.DataFrame({'X' :  pd.Series(['xx', 'yy', 'zz', 'hh']),
                       'Y' :  pd.Series(['ghj', 'dbj', 'lmf', 'hhjk']),
                       'Col2' :  pd.Series(['abd', 'def','ghi','jkl'])
                      })

 Col2   X    Y
0  abd  xx  ghj
1  def  yy  dbj
2  ghi  zz  lmf
3  jkl  hh  hjk

df2 =pd.DataFrame({'X' :  pd.Series(['www', 'ddd' ]),
                   'Col2' :  pd.Series([ 'def', 'jkl'])
                  })


  Col2    X
0  def  www
1  jkl  ddd

df1.query(-df1.Col2.isin(df2.Col2))

  Col2   X    Y
0  abd  xx  ghj
2  ghi  zz  lmf

从pandas

3 个答案: