Question

我一直在阅读这篇文章但仍然觉得这个主题有点令人困惑： http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

假设我有一个Pandas DataFrame，我希望同时将单个列的第一行和最后一行元素设置为任何值。我可以这样做：

df.iloc[[0, -1]].mycol = [1, 2]

告诉我A value is trying to be set on a copy of a slice from a DataFrame.并且这有潜在危险。

我可以使用.loc，但后来我需要知道第一行和最后一行的索引（相反，.iloc允许我按位置访问）。

最安全的Pandasy方式是什么？

要达到这一点：

# Django queryset
query = market.stats_set.annotate(distance=F("end_date") - query_date)

# Generate a dataframe from this queryset, and order by distance
df = pd.DataFrame.from_records(query.values("distance", *fields), coerce_float=True)
df = df.sort_values("distance").reset_index(drop=True)

然后，我尝试拨打df.distance.iloc[[0, -1]] = [1, 2]。这引发了警告。

Answer 1

问题不在于iloc，当您访问.mycol创建副本时，问题就出现了。您可以在iloc内完成所有操作：

df.iloc[[0, -1], df.columns.get_loc('mycol')] = [1, 2]

如果您想要基于混合整数和标签的访问权限，通常使用ix，但在这种情况下不起作用，因为-1实际上并不在索引中，显然{ {1}}并不够聪明，知道它应该是最后一个索引。

Answer 2

您正在做的事情称为链式索引，您可以在该列上使用iloc来避免警告：

In [24]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))

Out[24]:
          a         b         c
0  1.589940  0.735713 -1.158907
1  0.485653  0.044611  0.070907
2  1.123221 -0.862393 -0.807051
3  0.338653 -0.734169 -0.070471
4  0.344794  1.095861 -1.300339

In [25]:
df['a'].iloc[[0,-1]] ='foo'
df

Out[25]:
          a         b         c
0       foo  0.735713 -1.158907
1  0.485653  0.044611  0.070907
2   1.12322 -0.862393 -0.807051
3  0.338653 -0.734169 -0.070471
4       foo  1.095861 -1.300339

如果你这样做，那么就会发出警告：

In [27]:
df.iloc[[0,-1]]['a'] ='foo'

C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\IPython\kernel\__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':

设置数据框中列的第一行和最后一行

2 个答案: