带有 df.loc

时间:2021-05-03 22:17:00

标签: python pandas dataframe indexing warnings

OBS:我花了几个小时在 SO、Pandas 文档和其他一些网站上搜索,但不明白我的代码在哪里不起作用。

我的 UDF:

def indice(dfb, lb, ub):
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
    dfb = dfb[~dfb.isOutlier]

    dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
    df = df.astype({'indice': 'int64'})
    return dfb

重要提示:

  • isOutlier不存在。我现在正在这个函数中创建它。
  • indice不存在。我现在正在这个函数中创建它。
  • valor_unitario 存在并且它是一个浮点数
  • lbub 之前已定义
  • 这个函数在主代码的一个循环中(但是这个警告是因为 n=0 引发的)

发出警告

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

我在网络上发现了一些文章和问题,而且 StackOverflow 上还说使用 loc 可以解决问题。我试过了,但没有成功

1º 尝试 - 使用 loc

def indice(dfb, lb, ub):
->  dfb.loc[:,'isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
    dfb = dfb[~dfb.isOutlier]

->  dfb.loc[:,'indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
    df = df.astype({'indice': 'int64'})
    return dfb

我也尝试每次都使用 loc 实际上,我尝试了很多可能的组合...尝试在 df.loc 中使用 dfb['valor_unitario'] 和等等

现在我有同样的警告,两次,但有点不同:

self._setitem_single_column(ilocs[0], value, pi)self.obj[key] = value

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1676: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self._setitem_single_column(ilocs[0], value, pi)

C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
-> self.obj[key] = value

我也尝试过使用副本。第一次出现这个警告,简单使用 copy() 解决了问题,我不知道为什么现在它不起作用(我只是加载了更多数据)

2º 尝试 - 使用 copy()

我尝试将 copy() 放在三个地方,但没有成功

dfb = dfb[~dfb.isOutlier].copy()

dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub).copy()

dfb['isOutlier'] = ~dfb['valor_unitario'].copy().between(lb, ub)

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

我没有更多想法,非常感谢您的支持。

------- 最小可重现示例--------

Main_testing.py

import pandas as pd
import calculoindice_support as indice # module 01
import getitemsid_support as getitems # module 02

df = pd.DataFrame({'loja':[1,4,6,6,4,5,7,8],
                   'cod_produto':[21,21,21,55,55,43,26,30],
                   'valor_unitario':[332.21,333.40,333.39,220.40,220.40,104.66,65.00,14.00],
                   'documento':['324234','434144','532552','524523','524525','423844','529585','239484'],
                   'empresa':['ABC','ABC','ABC','ABC','ABC','CDE','CDE','CDE']
                   })

nome_coluna = 'cod_produto'
# getting items id to loop over them
product_ids = getitems.getitemsid(df, nome_coluna)

# initializing main DF with no data 
df_nf = pd.DataFrame(columns=list(df.columns.values))

n = 0
while n < len(product_ids):
    item = product_ids[n]
    df_item = df[df[nome_coluna] == item]
    # assigning bounds to each variable
    lb, ub = indice.limites(df_item, 10)
    # calculating index over DF, using LB and UB
    # creating temporary (for each loop) DF
    df_nf_aux = indice.indice(df_item, lb, ub)
    # assigning temporary DF to main DF that will be exported later
    df_nf = pd.concat([df_nf, df_nf_aux],ignore_index=True)
    n += 1

calculoindice_support.py(模块 01)

import pandas as pd

def limites(dfa,n):
    n_sigma = n * dfa.valor_unitario.std()
    mean = dfa.valor_unitario.mean()
    lb: float = mean - n_sigma
    ub: float = mean + n_sigma
    return (lb, ub)


def indice(dfb, lb, ub):
    if lb == ub:
        dfb.loc[:, 'isOutlier'] = False
        dfb.loc[:, 'indice'] = 1
    else:
        dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
        dfb = dfb[~dfb.isOutlier]

        dfb['indice'] = (dfb['valor_unitario'] - lb) / (ub - lb) * 2000
        # df = df.astype({'indice': 'int64'})

    return dfb

getitemsid_support.py(模块 02)

def getitemsid(df, coluna):
    a = df[coluna].tolist()
    return list(set(a))

警告输出:

C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1597: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = value
C:\ProgramData\Anaconda3\envs\Indice\lib\site-packages\pandas\core\indexing.py:1720: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value, pi)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)
C:\Users\...\calculoindice_support.py:16: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

1 个答案:

答案 0 :(得分:1)

问题出在您的 Main_testing.py

while n < len(product_ids):
    df_item = df[df[nome_coluna] == item]

    df_nf_aux = indice.indice(df_item, lb, ub)

首先你用条件 df 切片你的 df[nome_coluna] == item,这将返回一个数据帧的副本(你可以通过访问 _is_view_is_copy 属性来检查它)。然后将过滤后的数据帧传递给 indice 方法。

def indice(dfb, lb, ub):
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

indice 方法中,您为过滤后的数据框分配一个新列。这是一个隐式链式赋值。 Pandas 不知道您是要将新列添加到原始数据框中还是仅添加到过滤后的数据框中,因此 Pandas 会给您一个警告。

为了抑制这个警告,你可以明确地告诉 pandas 你想做什么

def indice(dfb, lb, ub):
    dfb = dfb.copy()
    dfb['isOutlier'] = ~dfb['valor_unitario'].between(lb, ub)

在上面的例子中,我创建了一个过滤数据框的副本。这意味着我想将新列添加到过滤后的非原始数据框中。

相关问题