编辑:从那以后,我意识到,使用.loc访问时,我只会看到警告而不会出错,因此该代码实际上可以正常工作。
我有一个需要清洗的数据集。我需要做的一件事是删除所有超过50000kn的交易。
成本bruto_prihod
是字符串格式,并且事务transakcija_sifra
跨越多行。
另一个不便之处在于,十进制格式使用逗号(在克罗地亚语中使用十进制逗号)代替点。
这是我执行任务之前的代码:
import pandas as pd
import numpy as np
diversus = pd.read_csv(r"C:\Users\dagejev\Desktop\excel\diversus4_saved.csv", encoding="cp1252", sep=";")
df = diversus.copy()
df_filter = df[(df['faza'] == 'A') &
( (df['grupa_id'] == 'W ODIJELO') | (df['grupa_id'] == 'ODIJELO') ) &
( (df['vp_kupac_id']!= 591) & (df['vp_kupac_id']!= 333) & (df['vp_kupac_id']!= 332) ) &
( (df['brand_naziv'] == 'JOOP') | (df['brand_naziv'] == 'JOOP!') ) &
(df['vrsta_robe']=='ROBA') &
(df['divizija'] == 'MEN')]
print('df: ', len(df), 'filtered: ', len(df_filter))
这是相关部分:
#This is what I tried first
df_filter['bruto_prihod'] = df_filter['bruto_prihod'].str.replace(',','.')
但是,它不起作用:
C:\Users\dagejev\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
建议的替代方法也没有:
df_filter[:,'bruto_prihod'] = df_filter['bruto_prihod'].str.replace(',','.')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-82-0e27bdc0e78f> in <module>
----> 1 df_filter[:,'bruto_prihod'] = df_filter['bruto_prihod'].str.replace(',','.')
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3368 else:
3369 # set column
-> 3370 self._set_item(key, value)
3371
3372 def _setitem_slice(self, key, value):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3443
3444 self._ensure_valid_index(value)
-> 3445 value = self._sanitize_column(key, value)
3446 NDFrame._set_item(self, key, value)
3447
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
3659
3660 # broadcast across multiple columns if necessary
-> 3661 if broadcast and key in self.columns and value.ndim == 1:
3662 if (not self.columns.is_unique or
3663 isinstance(self.columns, MultiIndex)):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in __contains__(self, key)
3918 @Appender(_index_shared_docs['contains'] % _index_doc_kwargs)
3919 def __contains__(self, key):
-> 3920 hash(key)
3921 try:
3922 return key in self._engine
TypeError: unhashable type: 'slice'
但是此链索引确实可以(which is supposedly always bad to use):
df_filter[:]['bruto_prihod'] = df_filter['bruto_prihod'].str.replace(',','.')
此外,这不起作用:
df_filter['bruto_prihod'] = pd.to_numeric(df_filter['bruto_prihod'])
但是这样做:
df_filter[:]['bruto_prihod'] = pd.to_numeric(df_filter['bruto_prihod'])
为什么会这样? 为什么要为就地分配工作需要链接索引?
此外,如果有人感兴趣,这就是我对交易进行分组的方式:
df_filter.groupby('transakcija_sifra').agg({'bruto_prihod':np.sum}).sort_values('bruto_prihod', ascending=False)
答案 0 :(得分:0)
为避免警告,您可以使用显式copy:
df_filter = df[(df['faza'] == 'A') &
( (df['grupa_id'] == 'W ODIJELO') | (df['grupa_id'] == 'ODIJELO') ) &
( (df['vp_kupac_id']!= 591) & (df['vp_kupac_id']!= 333) & (df['vp_kupac_id']!= 332) ) &
( (df['brand_naziv'] == 'JOOP') | (df['brand_naziv'] == 'JOOP!') ) &
(df['vrsta_robe']=='ROBA') &
(df['divizija'] == 'MEN')].copy()