如何通过pandas中的groupby输出来填充?

时间:2017-01-16 15:47:56

标签: python pandas

我的数据框有4列(A,B,C,D)。 D有一些NaN条目。我想用具有相同值A,B,C的D的平均值填充NaN值。

例如,如果A,B,C,D的值分别为x,y,z和Nan,那么我希望将NaN值替换为A的值的行的D的平均值, B,C分别为x,y,z。

3 个答案:

答案 0 :(得分:6)

我认为你需要:

df = pd.DataFrame({'A':[1,1,1,3],
                   'B':[1,1,1,3],
                   'C':[1,1,1,3],
                   'D':[1,np.nan,3,5]})

print (df)
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
print (df)
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0

样品:

#import <CommonCrypto/CommonCrypto.h>
#import "IAGAesGcm.h"

// Define an Encryption Key
u_char keyBytes[kCCKeySizeAES128] = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F, 0x10};
NSData *key = [NSData dataWithBytes:keyBytes length:sizeof(keyBytes)];

// Define an Initialization Vector
// GCM recommends a IV size of 96 bits (12 bytes), but you are free
// to use other sizes
u_char ivBytes[12] = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C};
NSData *iv = [NSData dataWithBytes:ivBytes length:sizeof(ivBytes)];

// Define an Additional Authenticated Data
NSData *aad = [@"AdditionalAuthenticatedData" dataUsingEncoding:NSUTF8StringEncoding];

// Now, we are ready to encrypt some plain data
NSData *expectedPlainData = [@"PlainData" dataUsingEncoding:NSUTF8StringEncoding];

// The returned ciphered data is a simple class with 2 properties: the actual encrypted data and the authentication tag.
// The authentication tag can have multiple sizes and it is up to you to set one, in this case the size is 128 bits
// (16 bytes)
IAGCipheredData *cipheredData = [IAGAesGcm cipheredDataByAuthenticatedEncryptingPlainData:expectedPlainData
                                                          withAdditionalAuthenticatedData:aad
                                                                  authenticationTagLength:IAGAuthenticationTagLength128
                                                                     initializationVector:iv
                                                                                      key:key
                                                                                    error:nil];

// And now, de-cypher the encrypted data to see if the returned plain data
// is as expected
NSData *plainData = [IAGAesGcm plainDataByAuthenticatedDecryptingCipheredData:cipheredData
                                              withAdditionalAuthenticatedData:aad
                                                         initializationVector:iv
                                                                          key:key
                                                                        error:nil];

XCTAssertEqualObjects(expectedPlainData, plainData);

答案 1 :(得分:6)

df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))会比apply

更快
In [2400]: df
Out[2400]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

In [2401]: df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
Out[2401]:
0    1.0
1    2.0
2    3.0
3    5.0
Name: D, dtype: float64

In [2402]: df['D'] = df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))

In [2403]: df
Out[2403]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0

详细

In [2396]: df.shape
Out[2396]: (10000, 4)

In [2398]: %timeit df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
100 loops, best of 3: 3.44 ms per loop


In [2397]: %timeit df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
100 loops, best of 3: 5.34 ms per loop

答案 2 :(得分:2)

链接到此问题的副本以获取更多信息: Pandas Dataframe: Replacing NaN with row average

在链接中提到的另一种建议的方法是在转置上使用简单的fillna: df.T.fillna(df.mean(axis=1)).T