drop_duplicates - ValueError:keep必须是“first”,“last”或False

时间:2015-10-30 14:25:09

标签: python debugging pandas duplicates

我安装了Pandas 17.0。我现在收到一个奇怪的错误

ValueError: keep must be either "first", "last" or False

当我尝试这个时:

ids=ids.drop_duplicates('ID')

这在以前的Pandas版本中始终有效,代码没有改变。 BTW ids是一个包含整数列的数据帧......

这是追溯:

Traceback (most recent call last):

File "<ipython-input-34-6e98a890591b>", line 1, in <module>
     ids=ids.drop_duplicates('ID')

File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
 line 89, in wrapper
     return func(*args, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line
 1164, in drop_duplicates
     return super(Series, self).drop_duplicates(keep=keep, inplace=inplace)

File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
 line 89, in wrapper
     return func(*args, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\base.py", line 576,
 in drop_duplicates
     duplicated = self.duplicated(keep=keep)

File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
 line 89, in wrapper
     return func(*args, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\series.py", line
 1169, in duplicated
     return super(Series, self).duplicated(keep=keep)

File "C:\Anaconda3\lib\site-packages\pandas\util\decorators.py",
 line 89, in wrapper
     return func(*args, **kwargs)

File "C:\Anaconda3\lib\site-packages\pandas\core\base.py", line 603,
 in duplicated
     duplicated = lib.duplicated(keys, keep=keep)

File "pandas\lib.pyx", line 1383, in pandas.lib.duplicated
 (pandas\lib.c:24490)

ValueError: keep must be either "first", "last" or False

注意keep=keep? Pandas 17.0中drop_duplicates的默认值为keep='first'。所以,如果我不指定它不应该默认为那个?为什么我会在这里收到错误? Pandas 17.0中的错误?

2 个答案:

答案 0 :(得分:2)

该错误表明function urlToBase64(url) { return new Promise((resolve, reject) => { request.get(url, function (error, response, body) { if (!error && response.statusCode == 200) { resolve("data:" + response.headers["content-type"] + ";base64," + new Buffer(body).toString('base64')); } else { reject(response); } }); }) } // Map input data to an Array of Promises let promises = input.map(element => { return urlToBase64(element.image) .then(base64 => { element.base64Data = base64; return element; }) }); // Wait for all Promises to complete Promise.all(promises) .then(results => { // Handle results }) .catch(e => { console.error(e); }) 实际上是ids,其中第一个参数是keep参数,如果Series确实是df,则此错误不会发生在drop_duplicates第一个参数是ids

答案 1 :(得分:0)

我尝试过语法(使用keep),以前是take_last ...

import pandas as pd
df = pd.DataFrame({'c1': ['cat'] * 3 + ['dog'] * 4,
                   'c2': [1, 1, 2, 3, 3, 4, 4]})

print(df)
print(df.drop_duplicates())   
print(df.drop_duplicates(['c1', 'c2'],keep='first'))   
print(df.drop_duplicates(['c1', 'c2'],keep='last'))   
print(df.drop_duplicates(['c1', 'c2'],keep=False))   #drops all but one cat stays

drop_duplicates()的默认值为keep='first',所有列均已计入。