从csv
文件加载的初始DataFrame中
df = pd.read_csv("file.csv",sep=";")
我用
获得了一个过滤后的副本df_filtered = df[df["filter_col_name"]== value]
但是,使用diff()
方法创建新列时,
df_filtered["diff"] = df_filtered["feature"].diff()
我收到以下警告:
/usr/local/bin/ipython3:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
#!/usr/bin/python3
我还注意到处理时间很长。
令人惊讶的是(对我来说......),如果我在非过滤的DataFrame上做同样的事情,我运行正常。
我应该如何创建一个"差异"筛选数据上的列?
答案 0 :(得分:1)
您需要copy
:
如果稍后修改df_filtered
中的值,您会发现修改不会传播回原始数据(df
),并且Pandas会发出警告。
#need process sliced df, return sliced df
df_filtered = df[df["filter_col_name"]== value].copy()
或者:
#need process sliced df, return all df
df.loc[df["filter_col_name"]== value, 'feature'] =
df.loc[df["filter_col_name"]== value , 'feature'].diff()
样品:
df = pd.DataFrame({'filter_col_name':[1,1,3],
'feature':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
C D E F feature filter_col_name
0 7 1 5 7 4 1
1 8 3 3 4 5 1
2 9 5 6 3 6 3
value = 1
df_filtered = df[df["filter_col_name"]== value].copy()
df_filtered["diff"] = df_filtered["feature"].diff()
print (df_filtered)
C D E F feature filter_col_name diff
0 7 1 5 7 4 1 NaN
1 8 3 3 4 5 1 1.0
value = 1
df.loc[df["filter_col_name"]== value, 'feature'] =
df.loc[df["filter_col_name"]== value , 'feature'].diff()
print (df)
C D E F feature filter_col_name
0 7 1 5 7 NaN 1
1 8 3 3 4 1.0 1
2 9 5 6 3 6.0 3
答案 1 :(得分:0)
尝试使用
df_filtered.loc[:, "diff"] = df_filtered["feature"].diff()