在过滤的pandas DataFrame上创建列

时间:2017-03-02 11:26:03

标签: python pandas data-science

csv文件加载的初始DataFrame中

df = pd.read_csv("file.csv",sep=";")

我用

获得了一个过滤后的副本
df_filtered = df[df["filter_col_name"]== value]

但是,使用diff()方法创建新列时,

df_filtered["diff"] = df_filtered["feature"].diff()

我收到以下警告:

/usr/local/bin/ipython3:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/usr/bin/python3

我还注意到处理时间很长。

令人惊讶的是(对我来说......),如果我在非过滤的DataFrame上做同样的事情,我运行正常。

我应该如何创建一个"差异"筛选数据上的列?

2 个答案:

答案 0 :(得分:1)

您需要copy

如果稍后修改df_filtered中的值,您会发现修改不会传播回原始数据(df),并且Pandas会发出警告。

#need process sliced df, return sliced df
df_filtered = df[df["filter_col_name"]== value].copy()

或者:

#need process sliced df, return all df
df.loc[df["filter_col_name"]== value, 'feature'] = 
df.loc[df["filter_col_name"]== value , 'feature'].diff()

样品:

df = pd.DataFrame({'filter_col_name':[1,1,3],
                   'feature':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

print (df)
   C  D  E  F  feature  filter_col_name
0  7  1  5  7        4                1
1  8  3  3  4        5                1
2  9  5  6  3        6                3
value = 1

df_filtered = df[df["filter_col_name"]== value].copy()
df_filtered["diff"] = df_filtered["feature"].diff()
print (df_filtered)
   C  D  E  F  feature  filter_col_name  diff
0  7  1  5  7        4                1   NaN
1  8  3  3  4        5                1   1.0
value = 1

df.loc[df["filter_col_name"]== value, 'feature'] = 
df.loc[df["filter_col_name"]== value , 'feature'].diff()

print (df)
   C  D  E  F  feature  filter_col_name
0  7  1  5  7      NaN                1
1  8  3  3  4      1.0                1
2  9  5  6  3      6.0                3

答案 1 :(得分:0)

尝试使用

df_filtered.loc[:, "diff"] = df_filtered["feature"].diff()