我们假设我有两个pandas数据帧df1
和df2
df1
s1 s2 s3
bob nan nan nan
john nan nan nan
matt nan nan nan
和
df2
s1 s3 s4
bob 32 11 22
matt 1 nan 2
我会在df1
中填充df2
中存在df1
行和列的值,以便输出
s1 s2 s3
bob 32 nan 11
john nan nan nan
matt 1 nan nan
这意味着,在这个玩具案例中,我对s4
的{{1}}列填充df2
不感兴趣。
我使用df1
的所有尝试都遗憾地失败了,我总是最终得到一个包含所有merge
的数据框。
答案 0 :(得分:4)
就地操作
使用pd.DataFrame.update
这将覆盖df1
中df2
df1.update(df2)
df1
s1 s2 s3
bob 32.0 NaN 11.0
john NaN NaN NaN
matt 1.0 NaN NaN
中的所有位置
fillna
制作副本1
使用pd.DataFrame.align
,pd.DataFrame.fillna
和pd.DataFrame.reindex_like
除非索引和列已对齐,否则pd.DataFrame.fillna(*df1.align(df2)).reindex_like(df1)
s1 s2 s3
bob 32.0 NaN 11.0
john NaN NaN NaN
matt 1.0 NaN NaN
无法正常工作。
df1
制作副本2
pd.DataFrame.combine_first
和pd.DataFrame.reindex_like
你首先提出哪一个是值得商榷的。考虑nan
全部是df1
,它并不重要。但这将保留df2.combine_first(df1)
中任何预先存在的非空值。否则,您可以将位置切换为df1.combine_first(df2).reindex_like(df1)
s1 s2 s3
bob 32.0 NaN 11.0
john NaN NaN NaN
matt 1.0 NaN NaN
。
# -*- mode: ruby -*-
# vi: set ft=ruby :
IP = "192.168.33.55"
VM_NAME = "jenkins"
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "geerlingguy/ubuntu1604" #target OS: Ubuntu 16.04
config.ssh.insert_key = false
config.vm.synced_folder ".", "/vagrant", disabled: true
config.ssh.forward_agent = true
config.vm.provider :virtualbox do |v|
v.name = VM_NAME
v.memory = 1024
v.cpus = 2
v.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
v.customize ["modifyvm", :id, "--ioapic", "on"]
end
config.vm.hostname = VM_NAME
config.vm.network :private_network, ip: IP
config.vm.network "forwarded_port", guest: 80, host: 8080
# Set the name of the VM. See: http://stackoverflow.com/a/17864388/100134
config.vm.define :jenkins do |jenkins|
end
# Ansible provisioner.
config.vm.provision "ansible" do |ansible|
ansible.playbook = "jenkins/playbook.yml"
ansible.inventory_path = "jenkins/inventory"
ansible.sudo = true
end
end