假设我有两个表:people_all
和people_usa
,它们具有相同的结构,因此具有相同的主键。
如何获得不在美国的人的表格? 在SQL中我会做类似的事情:
select a.*
from people_all a
left outer join people_usa u
on a.id = u.id
where u.id is null
Python的等价物是什么?我想不出把这个where语句翻译成pandas语法的方法。
我能想到的唯一方法是向people_usa添加一个任意字段(例如people_usa['dummy']=1
),进行左连接,然后只获取'dummy'为nan的记录,然后删除虚拟字段 - 这看起来有点令人费解。
谢谢!
答案 0 :(得分:12)
使用nbind
并取消布尔掩码:
isin
示例:
people_usa[~people_usa['ID'].isin(people_all ['ID'])]
因此从结果中删除3和4,布尔掩码如下所示:
In [364]:
people_all = pd.DataFrame({ 'ID' : np.arange(5)})
people_usa = pd.DataFrame({ 'ID' : [3,4,6,7,100]})
people_usa[~people_usa['ID'].isin(people_all['ID'])]
Out[364]:
ID
2 6
3 7
4 100
使用In [366]:
people_usa['ID'].isin(people_all['ID'])
Out[366]:
0 True
1 True
2 False
3 False
4 False
Name: ID, dtype: bool
反转掩码
答案 1 :(得分:2)
这是另一个类似于SQL Pandas的方法:.query():
from kivy.app import App
from kivy.uix.widget import Widget
from kivy.graphics import *
from kivy.properties import NumericProperty, ReferenceListProperty, ObjectProperty
from kivy.vector import Vector
from kivy.clock import Clock
from kivy.lang import Builder
class Planet(Widget):
# velocity of the ball on x and y axis
dx = NumericProperty(0)
dy = NumericProperty(0)
def init(self, pos=(50,50), **kwargs):
""" Initialize the planet"""
self.pos = pos
print("Init planet. pos:", self.pos)
# These shapes do not move with the widget.
# Why?
# Only the white circle in .kv lang moves with it.
self.canvas.add(Color(0.8,0,0))
self.canvas.add(Ellipse(pos=self.pos, size=(50,50)))
def move(self):
""" Move the planet. """
self.pos = Vector(self.velocity) + self.pos
print("Planet now at", self.pos)
class System(Widget):
mars = ObjectProperty(None)
def update(self, dt):
print("Update! " , dt)
if self.mars:
self.mars.move()
def spawn(self, dt):
print("Insert!", dt)
self.mars = Planet()
self.mars.init()
self.add_widget(self.mars)
self.mars.velocity = (1,1)
class PlanetApp(App):
def build(self):
sys = System()
Clock.schedule_interval(sys.update, 1/4)
Clock.schedule_once(sys.spawn, 3)
return sys
if __name__ == '__main__':
Builder.load_string("""
#:kivy 1.0.9
<Planet>
canvas:
Ellipse:
pos: self.pos
size: self.size
""")
PlanetApp().run()
或使用NumPy的in1d()方法:
people_all.query('ID not in @people_usa.ID')
注意:对于有SQL经验的人,可能需要阅读Pandas comparison with SQL
答案 2 :(得分:-1)
我将组合(通过堆叠)数据帧,然后执行.drop_duplicates方法。在此处找到文档:
http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.drop_duplicates.html