Question

305  151.55     C      1  113781   
297  151.55     C      1  113781   
306  151.55     C      1  113781   
498  151.55     C      1  113781   
708  151.55     C     1  113781   
141  151.55     C     1  113781

以上是数据集的示例。首先，我将具有相同数值的所有故障单分组在一起，然后我检查该组是否具有多个唯一的Cabin值。例如，Ticket＃110152只有一个唯一的Cabin值，'B'。另一方面，Ticket＃113781有一些独特的值，'C'和'NaN'。对于那些具有多个独特Cabin值的票组，但这些Cabin值中的至少一个必须是'Nan'值（因此具有两个独特Cabin值的票＃110465将不符合标准），我想要那些'Nan'值由组中的非null值填充。

因此，Ticket＃113781 Cabin值都是'C'

for i, j in df.groupby('Ticket'):
    if j.Ticket.count() >1 :   # This checks if there is more than one ticket in the group
    if len(j.Cabin.unique())>1:   #This checks if there is more than one unique value
        for i in j.Cabin.values[(j.Cabin.values== np.nan.all(1))]: #I was attempting to find out those groups with at least one 'Nan'  value. But this code wasn't working. I tried different iterations but I couldn't get this working.

以下是我参与的代码：

{{1}}

我和j.Cabin.values玩了一段时间，但我真的不知道如何巧妙地设置布尔值并提取'Nan'值。

Answer 1

选项1
bfill和ffill

def bffill(s):
    return s.bfill().ffill()

df['Cabin'] = df.groupby('Ticket').Cabin.apply(bffill)

选项2
transform + first + combine_first

df['Cabin'] = df.Cabin.combine_first(df.groupby('Ticket').Cabin.transform('first'))

注意对于索引110，您在同一张机票上有不同的机舱。联合收割机首先确保我不会覆盖原始舱室值。

两者都屈服

如果[Ticket]值相同，但其中一个项目缺少Cabin值，请填入值（泰坦尼克号）

1 个答案: