Question

编辑： 我在字符串np.nan中犯的菜鸟错误，由@ coldspeed，@ wen-ben和@ALollz指出。答案非常好，因此我不会删除此问题以保留这些答案。

原始：
我已阅读此问题/答案What's the difference between groupby.first() and groupby.head(1)?

该答案说明差异在于处理NaN值上。但是，当我用groupby呼叫as_index=False时，他们俩都选择了NaN。

此外，Pandas具有groupby.nth，其功能与head和first

groupby.first(), groupby.nth(0), groupby.head(1)与as_index=False有什么区别？

以下示例：

In [448]: df
Out[448]:
   A       B
0  1  np.nan
1  1       4
2  1      14
3  2       8
4  2      19
5  2      12

In [449]: df.groupby('A', as_index=False).head(1)
Out[449]:
   A       B
0  1  np.nan
3  2       8

In [450]: df.groupby('A', as_index=False).first()
Out[450]:
   A       B
0  1  np.nan
1  2       8

In [451]: df.groupby('A', as_index=False).nth(0)
Out[451]:
   A       B
0  1  np.nan
3  2       8

我看到`firs（）'重置了索引，而其他2则没有。除此之外，有什么区别吗？

Answer 1

这里是不同的，您需要将np.nan更改为NaN，在原始df中是string，将其转换后，您会看到不同的

df=df.mask(df=='np.nan')
df.groupby('A', as_index=False).head(1) #df.groupby('A', as_index=False).nth(0)

Out[8]: 
   A    B
0  1  NaN
3  2    8
df.groupby('A', as_index=False).first() 
# the reason why first have the index reset, 
#since it will have chance select the value from different row within the group, 
#when the first item is NaN it will skip it to find the first not null value 
#rather than from the same row, 
#If still keep the original row index will be misleading. 
Out[9]: 
   A  B
0  1  4
1  2  8

Answer 2

主要问题是您可能存储了字符串'np.nan'，而不是真正的null值。以下是这三个如何处理null值的方式：

样本数据：

import pandas as pd
df = pd.DataFrame({'A': [1,1,2,2,3,3], 'B': [None, '1', np.NaN, '2', 3, 4]})

`first`

这将返回每个组中的第一个非空值。奇怪的是，它不会跳过None，尽管可以通过kwarg dropna=True来实现。因此，您可能会返回原来属于不同行的列的值：

df.groupby('A', as_index=False).first()
#   A     B
#0  1  None
#1  2     2
#2  3     3

df.groupby('A', as_index=False).first(dropna=True)
#   A  B
#0  1  1
#1  2  2
#2  3  3

`head(n)`

返回组中的前n行。 值保留在行内。如果您给它的n超过了行数，它将返回该组中的所有行而不会抱怨：

df.groupby('A', as_index=False).head(1)
#   A     B
#0  1  None
#2  2   NaN
#4  3     3

df.groupby('A', as_index=False).head(200)
#   A     B
#0  1  None
#1  1     1
#2  2   NaN
#3  2     2
#4  3     3
#5  3     4

`nth`：

这占用了nth行，因此值仍然在该行内绑定。 .nth(0)与.head(1)相同，尽管用途不同。例如，如果您需要第0行和第2行，那么.head()很难做到，.nth([0,2])却很容易。同样，写.head(10)比写.nth(list(range(10))))还容易。

df.groupby('A', as_index=False).nth(0)
#   A     B
#0  1  None
#2  2   NaN
#4  3     3

nth还支持删除具有任何空值的行，因此您可以使用它返回不包含任何空值的第一行，这与.head()

不同

df.groupby('A', as_index=False).nth(0, dropna='any')
#   A  B
#A      
#1  1  1
#2  2  2
#3  3  3

as_index = False时，groupby.first，groupby.nth，groupby.head有什么区别

2 个答案:

样本数据：

`first`

`head(n)`

`nth`：

as_index = False时，groupby.first，groupby.nth，groupby.head有什么区别

2 个答案:

样本数据：

first

head(n)

nth：

`first`

`head(n)`

`nth`：