下面是一个更大的数据框的示例。
Fare Cabin Pclass Ticket Name
257 86.5000 B77 1 110152 Cherry, Miss. Gladys
759 86.5000 B77 1 110152 Rothes, the Countess. of (Lucy Noel Martha Dye...
504 86.5000 B79 1 110152 Maioni, Miss. Roberta
262 79.6500 E67 1 110413 Taussig, Mr. Emil
558 79.6500 E67 1 110413 Taussig, Mrs. Emil (Tillie Mandelbaum)
585 79.6500 NaN 1 110413 Taussig, Miss. Ruth
475 52.0000 A14 1 110465 Clifford, Mr. George Quincy
110 52.0000 C110 1 110465 Porter, Mr. Walter Chamberlain
335 26.0000 C106 1 110469 Maguire, Mr. John Edward
158 26.5500 D22 1 110489 Borebank, Mr. John James
430 26.5500 C52 1 110564 Bjornstrom-Steffansson, Mr. Mauritz Hakan
236 75.2500 D37 1 110813 Warren, Mr. Frank Manley
366 75.2500 D37 1 110813 Warren, Mrs. Frank Manley (Anna Sophia Atkinson)
191 26.0000 NaN 1 111163 Salomon, Mr. Abraham L
170 33.5000 B19 1 111240 Van der hoef, Mr. Wyckoff
462 38.5000 E63 1 111320 Gee, Mr. Arthur H
329 57.9792 Nan 1 111361 Hippach, Miss. Jean Gertrude
523 57.9792 B18 1 111361 Hippach, Mrs. Louis Albert (Ida Sophia Fischer)
如果我想为缺少“Cabin”值的人填写“Cabin”缺失值,并使用其他人的“Cabin”值,只有
其他人(具有客舱价值的人)具有相同的姓氏,并且也在自己附近(如上面的一个或下面的一个)。
所以在上面的数据框中,[Tassuig,Miss.Ruth]的“Nan”的Cabin值将被[Tassuig,Mrs.Emil]的客舱价值[E67]的客舱价值取代,因为她超越了自己,两个条件都得到满足。 (姓氏相同,在附近)
[Hippach,Miss.Jean Gertrude]缺少的客舱价值将被替换为 [Hippach,Louis Albert夫人(Ida Sophia Fischer)] [B18]的小屋价值。
我试着考虑迭代,但就我而言
for x in df.Name.str.split(',')[x][0] ==df.Name.str.split(',')[x+1][0]:
if df.Cabin[x] or df.Cabin[x+1] == np.nan:
df.Cabin.replace(np.nan,
我想确保将np.nan值替换为True值而不是np.nan。无法弄清楚如何做到这一点。
感谢。
答案 0 :(得分:3)
从您的DataFrame开始
print(df)
Fare Cabin Pclass Ticket \
0 86.5000 B77 1 110152
1 86.5000 B77 1 110152
2 86.5000 B79 1 110152
3 79.6500 E67 1 110413
4 79.6500 E67 1 110413
5 79.6500 NaN 1 110413
6 52.0000 A14 1 110465
7 52.0000 C110 1 110465
8 26.0000 C106 1 110469
9 26.5500 D22 1 110489
10 26.5500 C52 1 110564
11 75.2500 D37 1 110813
12 75.2500 D37 1 110813
13 26.0000 NaN 1 111163
14 33.5000 B19 1 111240
15 38.5000 E63 1 111320
16 57.9792 NaN 1 111361
17 57.9792 B18 1 111361
Name
0 Cherry, Miss. Gladys
1 Rothes, the Countess. of (Lucy Noel Martha Dye...
2 Maioni, Miss. Roberta
3 Taussig, Mr. Emil
4 Taussig, Mrs. Emil (Tillie Mandelbaum)
5 Taussig, Miss. Ruth
6 Clifford, Mr. George Quincy
7 Porter, Mr. Walter Chamberlain
8 Maguire, Mr. John Edward
9 Borebank, Mr. John James
10 Bjornstrom-Steffansson, Mr. Mauritz Hakan
11 Warren, Mr. Frank Manley
12 Warren, Mrs. Frank Manley (Anna Sophia Atkinson)
13 Salomon, Mr. Abraham L
14 Van der hoef, Mr. Wyckoff
15 Gee, Mr. Arthur H
16 Hippach, Miss. Jean Gertrude
17 Hippach, Mrs. Louis Albert (Ida Sophia Fischer)
仅使用LastName创建新列/系列。注意,使用pandas str方法可能是更好的方法,但是我无法使用任何东西
df['LastName'] = df['Name'].map(lambda x : x[:x.find(',')])
然后我们利用熊猫' shift和布尔索引以查看上面的乘客是否具有相同的姓氏(即Taussig案例)
filter = (df['Cabin'].isnull()) & (df['LastName'] == df['LastName'].shift())
df.loc[filter,'Cabin'] = df['Cabin'].shift()
然后下面的乘客将-1传递给shift()(即Hippach案例)
filter = (df['Cabin'].isnull()) & (df['LastName'] == df['LastName'].shift(-1))
df.loc[filter,'Cabin'] = df['Cabin'].shift(-1)
print(df)
Fare Cabin Pclass Ticket \
0 86.5000 B77 1 110152
1 86.5000 B77 1 110152
2 86.5000 B79 1 110152
3 79.6500 E67 1 110413
4 79.6500 E67 1 110413
5 79.6500 E67 1 110413
6 52.0000 A14 1 110465
7 52.0000 C110 1 110465
8 26.0000 C106 1 110469
9 26.5500 D22 1 110489
10 26.5500 C52 1 110564
11 75.2500 D37 1 110813
12 75.2500 D37 1 110813
13 26.0000 NaN 1 111163
14 33.5000 B19 1 111240
15 38.5000 E63 1 111320
16 57.9792 B18 1 111361
17 57.9792 B18 1 111361
Name LastName
0 Cherry, Miss. Gladys Cherry
1 Rothes, the Countess. of (Lucy Noel Martha Dye... Rothes
2 Maioni, Miss. Roberta Maioni
3 Taussig, Mr. Emil Taussig
4 Taussig, Mrs. Emil (Tillie Mandelbaum) Taussig
5 Taussig, Miss. Ruth Taussig
6 Clifford, Mr. George Quincy Clifford
7 Porter, Mr. Walter Chamberlain Porter
8 Maguire, Mr. John Edward Maguire
9 Borebank, Mr. John James Borebank
10 Bjornstrom-Steffansson, Mr. Mauritz Hakan Bjornstrom-Steffansson
11 Warren, Mr. Frank Manley Warren
12 Warren, Mrs. Frank Manley (Anna Sophia Atkinson) Warren
13 Salomon, Mr. Abraham L Salomon
14 Van der hoef, Mr. Wyckoff Van der hoef
15 Gee, Mr. Arthur H Gee
16 Hippach, Miss. Jean Gertrude Hippach
17 Hippach, Mrs. Louis Albert (Ida Sophia Fischer) Hippach
答案 1 :(得分:2)