Question

所以我不确定以下内容本身是否有意，但这似乎是我之前使用的pandas 0.18.0的行为改变。我已经更新到0.23.0并且我得到了一些奇怪的行为......

我们假设我有一些名为dfLarge的大型数据框，我根据某些条件从中获取了一个子集df。（问题的这一部分对于复制并不是真正必要的，但它来自我的实际用例以及我如何注意到熊猫行为的变化）。但事实恰恰相反，我无法在dfLarge中找到我所寻找的条件，因此df为空。

重要的是，df与dtypes共享dfLarge。通常，对于某些df：

，它可能看起来像这样

In [187]: df = pd.DataFrame(columns = ['field1','field2','field3','num1','num2'])

In [188]: df['num1'] = df['num1'].astype('float64')  # assume this was inherited from dfLarge

In [189]: df['num2'] = df['num2'].astype('float64')  # assume this was inherited from dfLarge

In [190]: df.dtypes
Out[190]:
field1     object
field2     object
field3     object
num1      float64
num2      float64
dtype: object

现在我们为空数据帧df提供了一些字段和不同的数据类型。我使用df.groupby汇总我的数据，同时通过field1和field2之间的汇总来维护我的索引，结果数据框会更改我的字段的dtype。

In [191]: dfGrouped = df.groupby(['field1','field2'])[['num1','num2']].sum().reset_index(level=['field1','field2'])

In [192]: dfGrouped.dtypes
Out[192]:
field1     float64
field2     float64
num1       float64
num2       float64
dtype: object

从pandas文档中可以看出，df.groupby不应该这样做，我只发现pandas 0.18.0（dtypes不改变）的行为改变为当我针对某些字符串测试各种TypeErrors时，我遇到了后续的fields。有没有办法优雅地处理这个问题，而不是在dtypes之前将groupby分配给新对象并以df['field'] = df['field'].astype('newtype')方式重新应用它们？谢谢。

Answer 1

指定groupby时使用# Default Variables newrelic = decrypt_databag('newrelic') node.default['newrelic_npi']['license_key'] = newrelic['license_key'] plugin = 'com.newrelic.plugins.mysql.instance' plugin_json_path = "/opt/newrelic-npi/plugins/com.newrelic.plugins.mysql.instance/newrelic_mysql_plugin-2.0.0/config/plugin.json" node.default['newrelic_npi']['name'] = "localhost1" execute 'fetch' do cwd node['newrelic_npi']['install_path'] user node['newrelic_npi']['user'] command "./npi fetch #{plugin} -y" not_if do ::File.exist? "#{node['newrelic_npi']['install_path']}/plugins/#{plugin}" end end execute 'prepare' do cwd node['newrelic_npi']['install_path'] user node['newrelic_npi']['user'] command "./npi prepare #{plugin}" end template "#{plugin_json_path}" do source 'plugin.json.erb' owner "root" group "root" mode "0644" variables :name => node['newrelic_npi']['name'] action :create end execute 'add-service' do cwd node['newrelic_npi']['install_path'] user node['newrelic_npi']['user'] command "sudo ./npi add-service #{plugin} --start" # needs root privileges not_if do ::File.exist? "/etc/init.d/newrelic_plugin_#{plugin}" end end。

我认为此错误是由设置和重置空template "#{plugin_json_path}" do source 'plugin.json.erb' owner "root" group "root" mode "0644" variables :name => node['newrelic_npi']['name'] action :create end（as_index=False设置MultiIndex，然后重置它）引起的。请参阅GitHub问题跟踪器上的#19602。使用groupby会阻止此模式发生，因为MultiIndex首先不会被as_index=False设置。

MultiIndex

请注意，这也应该保留非空DataFrame的行为：

groupby

dataframe.groupby更改空数据帧的{d}类型

1 个答案: