尝试在切片错误的副本上设置值

时间:2020-02-07 19:03:20

标签: python pandas

尝试解决该错误:

application.py:25: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

application.py:26: SettingWithCopyWarning:

但无法弄清楚为什么我会收到此错误以及如何解决它。

这是我的代码:

hr = hr_data[['Month','SalesSystemCode','TITULO','BirthDate','HireDate','SupervisorEmployeeID','BASE','carallowance','Commission_Target','Area','Fulfilment %','Commission Accrued','Commission paid',
  'Características (D)', 'Características (I)', 'Características (S)','Características (C)', 'Motivación (D)', 'Motivación (I)','Motivación (S)', 'Motivación (C)', 'Bajo Stress (D)',
  'Bajo Stress (I)', 'Bajo Stress (S)', 'Bajo Stress (C)']]
sales  = sales_data[['Report month', 'Area','Customer','Rental Charge','Cod. Motivo Desconexion','ID Vendedor']]
#report month to datetime
sales['Report month'] = pd.to_datetime(sales['Report month'])
hr['Month'] = pd.to_datetime(hr['Month'])
#remove sales where customer churned
sales_clean = sales.loc[sales['Cod. Motivo Desconexion'] == 0]
sales_clean = sales_clean[['Report month','Rental Charge','ID Vendedor']]
sales_clean2 = pd.DataFrame(sales_clean.groupby(['Report month','ID Vendedor'])['Rental Charge'].sum())
sales_clean2.reset_index(inplace=True)
hr_area = hr.loc[hr['Area'] == 'Area 1']
merged_hr = hr_area.merge(sales_clean, left_on=['SalesSystemCode','Month'],right_on=['ID Vendedor','Report month'],how='left')
#creating new features: months of employment
merged_hr['MonthsofEmploymentRounded'] = round((merged_hr['Month'] - merged_hr['HireDate'])/np.timedelta64(1,'M')).astype('int')
#filters for interaction
YEAR_MONTH = merged_hr['Month'].unique()
#css stylesheet
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
app = dash.Dash(__name__, external_stylesheets=external_stylesheets)
#html layout
app.layout = html.Div(children=[
    html.H1(children='SAC Challenge Level 2 Dashboard', style ={
        'textAlign': 'center',
        'height':'10'
    }),
    html.Div(children='''
        Objective: Studying the impact of supervision on the performance of sales executives in Area 1
        '''),
    dcc.DatePickerRange(
        id='year_month',
        start_date= min(merged_hr['Month'].dt.date.tolist()),
        end_date = 'Select date'
    ),
    dcc.Graph(
        id='performancetable'
    )
])
@app.callback(dash.dependencies.Output('performancetable','figure'),
             [dash.dependencies.Input('year_month', 'start_date'),
              dash.dependencies.Input('year_month','end_date')])
def update_table(year_month):
    if year_month is None or year_month ==[]:
        year_month = YEAR_MONTH
        performance = merged_hr[(merged_hr['Month'].isin(year_month))]
        return {
            'data': [
                go.Table(
                    header = dict(values=list(performance.columns),fill_color='paleturquoise',align='left'),
                    cells = dict(values=[performance['Month'],performance['SalesSystemCode'],performance['TITULO'],
                                         performance['HireDate'],performance['MonthsofEmploymentRounded'],performance['SupervisorEmployeeID'],
                                         performance['BASE'],performance['carallowance'],performance['Commission_Target'],
                                         performance['Fulfilment %'], performance['Commission Accrued'],performance['Commission paid'],
                                         performance['Características (D)'],performance['Características (I)'],performance['Características (S)'],
                                         performance['Características (C)'],performance['Motivación (D)'],performance['Motivación (I)'],
                                         performance['Motivación (S)'],performance['Motivación (C)'],performance['Bajo Stress (D)'],
                                         performance['Bajo Stress (I)'],performance['Bajo Stress (S)'],performance['Bajo Stress (C)'],
                                         performance['Rental Charge']])
                )],
        }
    if __name__ == '__main__':
        app.run_server(debug=True)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

以下是hr_data的示例:

{'Month': {0: Timestamp('2017-12-01 00:00:00'),
  1: Timestamp('2017-12-01 00:00:00'),
  2: Timestamp('2017-12-01 00:00:00'),
  3: Timestamp('2017-12-01 00:00:00'),
  4: Timestamp('2017-12-01 00:00:00')},
 'EmployeeID': {0: 91868, 1: 1812496, 2: 1812430, 3: 700915, 4: 1812581},
 'PayrollProviderName': {0: 'Tele',
  1: 'People',
  2: 'People',
  3: 'Stratego',
  4: 'People'},
 'SalesSystemCode': {0: 91868.0,
  1: 802496.0,
  2: 2430.0,
  3: 700915.0,
  4: 802581.0},
 'Payroll Type': {0: 'Insourcing',
  1: 'Third Party',
  2: 'Third Party',
  3: 'Third Party',
  4: 'Third Party'},
 'Name': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'TITULO': {0: 'SALES SUPERVISOR',
  1: 'SALES EXECUTIVE',
  2: 'SALES EXECUTIVE',
  3: 'SALES EXECUTIVE',
  4: 'SALES EXECUTIVE'},
 'Sexo': {0: 'M', 1: 'F', 2: 'F', 3: 'M', 4: 'F'},
 'BirthDate': {0: Timestamp('1982-11-05 00:00:00'),
  1: Timestamp('1987-09-24 00:00:00'),
  2: Timestamp('1981-01-13 00:00:00'),
  3: Timestamp('1986-04-18 00:00:00'),
  4: Timestamp('1991-06-24 00:00:00')},
 'HireDate': {0: Timestamp('2012-04-23 00:00:00'),
  1: Timestamp('2017-04-10 00:00:00'),
  2: Timestamp('2017-03-13 00:00:00'),
  3: Timestamp('2015-01-22 00:00:00'),
  4: Timestamp('2017-05-18 00:00:00')},
 'SupervisorEmployeeID': {0: 7935, 1: 91868, 2: 91868, 3: 91868, 4: 91868},
 'SupervisorName': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'BASE': {0: 895, 1: 700, 2: 700, 3: 700, 4: 700},
 'carallowance': {0: 350, 1: 250, 2: 250, 3: 250, 4: 250},
 'Commission_Target': {0: 708.33, 1: 583.33, 2: 583.33, 3: 583.33, 4: 583.33},
 'Nacionalidad': {0: 'INT', 1: 'INT', 2: 'INT', 3: 'INT', 4: 'INT'},
 'Area': {0: 'Area 1', 1: 'Area 1', 2: 'Area 1', 3: 'Area 1', 4: 'Area 1'},
 'Comment': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Sales Quota (points)': {0: 1810.0, 1: 108.0, 2: 108.0, 3: 108.0, 4: 108.0},
 'Real (points)': {0: 1855.0, 1: 86.0, 2: 245.0, 3: 149.0, 4: 91.0},
 'Fulfilment %': {0: 1.0248618784530388,
  1: 0.7962962962962963,
  2: 2.2685185185185186,
  3: 1.3796296296296295,
  4: 0.8425925925925926},
 'Commission Accrued': {0: 708.33, 1: 583.33, 2: 583.33, 3: 583.33, 4: 583.33},
 'OA Commission Accrued': {0: 653.66,
  1: 87.5,
  2: 1494.79,
  3: 794.79,
  4: 160.42},
 'Clawback': {0: 0.0, 1: 24.33, 2: 144.9, 3: 36.77, 4: 0.0},
 'Other Commissions': {0: 0.0, 1: 0.0, 2: 9.16, 3: 9.16, 4: 0.0},
 'Commission paid': {0: 1361.99, 1: 646.51, 2: 1942.38, 3: 1350.52, 4: 743.75},
 'Exit Date': {0: NaT,
  1: Timestamp('2018-04-13 00:00:00'),
  2: NaT,
  3: NaT,
  4: Timestamp('2018-08-31 00:00:00')},
 'Legal Motive': {0: nan,
  1: 'Artículo No. 212',
  2: nan,
  3: nan,
  4: 'Artículo No. 212'},
 'Características (D)': {0: nan, 1: 70.0, 2: 70.0, 3: 60.0, 4: 67.0},
 'Características (I)': {0: nan, 1: 95.0, 2: 62.0, 3: 25.0, 4: 15.0},
 'Características (S)': {0: nan, 1: 20.0, 2: 48.0, 3: 75.0, 4: 40.0},
 'Características (C)': {0: nan, 1: 25.0, 2: 34.0, 3: 85.0, 4: 94.0},
 'Motivación (D)': {0: nan, 1: 85.0, 2: 75.0, 3: 40.0, 4: 59.0},
 'Motivación (I)': {0: nan, 1: 95.0, 2: 74.0, 3: 74.0, 4: 25.0},
 'Motivación (S)': {0: nan, 1: 11.0, 2: 58.0, 3: 65.0, 4: 65.0},
 'Motivación (C)': {0: nan, 1: 7.0, 2: 33.0, 3: 84.0, 4: 93.0},
 'Bajo Stress (D)': {0: nan, 1: 60.0, 2: 69.0, 3: 79.0, 4: 79.0},
 'Bajo Stress (I)': {0: nan, 1: 86.0, 2: 60.0, 3: 6.0, 4: 18.0},
 'Bajo Stress (S)': {0: nan, 1: 40.0, 2: 60.0, 3: 89.0, 4: 30.0},
 'Bajo Stress (C)': {0: nan, 1: 60.0, 2: 48.0, 3: 84.0, 4: 92.0}}

sales_data:

{'Month': {0: Timestamp('2017-07-01 00:00:00'),
  1: Timestamp('2017-07-01 00:00:00'),
  2: Timestamp('2017-07-01 00:00:00'),
  3: Timestamp('2017-07-01 00:00:00'),
  4: Timestamp('2017-07-01 00:00:00')},
 'Report month': {0: '2017-07',
  1: '2017-07',
  2: '2017-07',
  3: '2017-07',
  4: '2017-07'},
 'Area': {0: 'Area 1', 1: 'Area 1', 2: 'Area 1', 3: 'Area 1', 4: 'Area 1'},
 'Fecha de solicitud': {0: Timestamp('2017-07-25 14:49:51'),
  1: Timestamp('2017-07-25 14:56:14'),
  2: Timestamp('2017-06-30 13:07:10'),
  3: Timestamp('2017-07-03 18:25:17'),
  4: Timestamp('2017-07-04 09:56:24')},
 'Fecha de salida': {0: Timestamp('2017-07-27 13:11:42'),
  1: Timestamp('2017-07-27 15:08:39'),
  2: Timestamp('2017-07-04 11:50:07'),
  3: Timestamp('2017-07-07 16:40:44'),
  4: Timestamp('2017-07-14 14:52:45')},
 'Fecha de salida final': {0: Timestamp('2017-07-28 15:13:53'),
  1: Timestamp('2017-07-27 15:46:16'),
  2: Timestamp('2017-07-05 10:24:46'),
  3: Timestamp('2017-07-08 08:36:43'),
  4: Timestamp('2017-07-15 10:00:02')},
 'Fecha de proceso': {0: Timestamp('2017-08-01 00:00:00'),
  1: Timestamp('2017-08-01 00:00:00'),
  2: Timestamp('2017-08-01 00:00:00'),
  3: Timestamp('2017-08-01 00:00:00'),
  4: Timestamp('2017-08-01 00:00:00')},
 'Fecha de sistema': {0: Timestamp('2017-07-25 14:49:51'),
  1: Timestamp('2017-07-25 14:56:14'),
  2: Timestamp('2017-06-30 13:07:10'),
  3: Timestamp('2017-07-03 18:25:17'),
  4: Timestamp('2017-07-04 09:56:24')},
 'Fecha de completada': {0: Timestamp('2017-07-28 15:13:52'),
  1: Timestamp('2017-07-27 15:46:15'),
  2: Timestamp('2017-07-05 10:24:45'),
  3: Timestamp('2017-07-08 08:36:42'),
  4: Timestamp('2017-07-15 10:00:02')},
 'Fecha de creada': {0: Timestamp('2017-07-25 14:50:00'),
  1: Timestamp('2017-07-25 14:56:00'),
  2: Timestamp('2017-06-30 13:07:00'),
  3: Timestamp('2017-07-03 18:25:00'),
  4: Timestamp('2017-07-04 09:56:00')},
 'Cod. de Distribucion': {0: 2302, 1: 2302, 2: 2302, 3: 91818, 4: 2302},
 'Customer': {0: 19308378, 1: 19308378, 2: 27504455, 3: 27104497, 4: 17608676},
 'Cod. Tipo Cliente': {0: 'R', 1: 'R', 2: 'R', 3: 'R', 4: 'R'},
 'Tipo De Cliente': {0: 'Residencial                             ',
  1: 'Residencial                             ',
  2: 'Residencial                             ',
  3: 'Residencial                             ',
  4: 'Residencial                             '},
 'Cuenta': {0: 193083780000,
  1: 193083780000,
  2: 275044550000,
  3: 271044970000,
  4: 176086760000},
 'Status Cuenta': {0: 'W', 1: 'W', 2: 'W', 3: 'W', 4: 'W'},
 'Tipo de Contabilidad': {0: 'RP', 1: 'RP', 2: 'RP', 3: 'RP', 4: 'RP'},
 'Desc. Tipo Contabilidad': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Tos Cat': {0: 'K', 1: 'K', 2: 'K', 3: 'K', 4: 'K'},
 'Desc. Tos Cat': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Mktg Cat': {0: 990005.0, 1: 990005.0, 2: 990000.0, 3: 990000.0, 4: 990000.0},
 'Desc. Mktg Cat': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Cod. Bill Sort': {0: 571.0, 1: 571.0, 2: 571.0, 3: 691.0, 4: 256.0},
 'Orden de Servicio': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Comando': {0: 'PMO', 1: 'PFB', 2: 'PMO', 3: 'PMO', 4: 'PMO'},
 'Desc. Comando': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Prioridad': {0: 5, 1: 5, 2: 5, 3: 5, 4: 5},
 'Cod. Línea': {0: 3, 1: 2, 2: 1, 3: 1, 4: 1},
 'Número de Servicio': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Producto': {0: 1420, 1: 31000, 2: 1403, 3: 1404, 4: 1404},
 'Desc. Producto': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Familia': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Sub Familia': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Rental Charge': {0: 22.5,
  1: 18.7125,
  2: 15.257499999999999,
  3: 19.95,
  4: 19.95},
 'Inst Charge': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
 'Control': {0: 'CONEXIONES_COMPLETADAS_CT',
  1: 'CONEXIONES_COMPLETADAS_CT',
  2: 'CONEXIONES_COMPLETADAS',
  3: 'CONEXIONES_COMPLETADAS',
  4: 'CONEXIONES_COMPLETADAS'},
 'Cod. Estatus': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A'},
 'Status': {0: 'Por Acción                              ',
  1: 'Por Acción                              ',
  2: 'Por Acción                              ',
  3: 'Por Acción                              ',
  4: 'Por Acción                              '},
 'Cod Razon Pendiente': {0: '   ', 1: '   ', 2: '   ', 3: '   ', 4: '   '},
 'Razon Pendiente': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Cod. Motivo Desconexion': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 'Motivo Desconexion': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Cod. Agencia': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Agencia': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'ID Vendedor': {0: 2352.0, 1: 2352.0, 2: 2352.0, 3: 2352.0, 4: 2352.0},
 'ID Oficinista': {0: 229113.0,
  1: 229113.0,
  2: 224666.0,
  3: 221532.0,
  4: 224666.0},
 'ID Acct Manager': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0},
 'Desc. Acct Manager': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Provincia': {0: 'A', 1: 'A', 2: 'A', 3: 'B', 4: 'B'},
 'Central': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Chrg Prod Ant': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Tipo Srv': {0: 'MO', 1: 'TI', 2: 'MO', 3: 'MO', 4: 'MO'},
 'Tipo Srv Desc': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan},
 'Diferencia ': {0: 2.5500000000000007,
  1: 0.0,
  2: 15.257499999999999,
  3: 19.95,
  4: 19.95},
 'Puntos ': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan}}

1 个答案:

答案 0 :(得分:0)

@QuanHoang的评论指出了正确的方向,但是您需要为.copy()hr数据帧添加sales

hr = hr_data[['Month','SalesSystemCode','TITULO','BirthDate','HireDate','SupervisorEmployeeID','BASE','carallowance','Commission_Target','Area','Fulfilment %','Commission Accrued','Commission paid',
  'Características (D)', 'Características (I)', 'Características (S)','Características (C)', 'Motivación (D)', 'Motivación (I)','Motivación (S)', 'Motivación (C)', 'Bajo Stress (D)',
  'Bajo Stress (I)', 'Bajo Stress (S)', 'Bajo Stress (C)']].copy()
sales  = sales_data[['Report month', 'Area','Customer','Rental Charge','Cod. Motivo Desconexion','ID Vendedor']].copy()

使用.copy()之所以有效,是因为它创建了数据的完整副本,而不是视图。随后的索引操作将在副本上正常工作。

另一种选择是在进行.loc[]hr_data中的选择时使用sales_data索引。这也应该起作用:

hr = hr_data.loc[:, ['Month','SalesSystemCode','TITULO','BirthDate','HireDate','SupervisorEmployeeID','BASE','carallowance','Commission_Target','Area','Fulfilment %','Commission Accrued','Commission paid',
  'Características (D)', 'Características (I)', 'Características (S)','Características (C)', 'Motivación (D)', 'Motivación (I)','Motivación (S)', 'Motivación (C)', 'Bajo Stress (D)',
  'Bajo Stress (I)', 'Bajo Stress (S)', 'Bajo Stress (C)']]
sales = sales_data.loc[:, ['Report month', 'Area','Customer','Rental Charge','Cod. Motivo Desconexion','ID Vendedor']]

请注意,选择.loc[]的列将使用df.loc[:, [ *columns* ]格式,因为.loc[]需要明确指定行。

使用.loc[]之所以有效,是因为.loc[](和.iloc[])索引返回了对原始数据帧的引用,但是具有更新的索引行为,因此不会出现“设置副本”问题