现在我有这样的DF
Word Word2 Word3
Hello NaN NaN
My My Name NaN
Yellow Yellow Bee Yellow Bee Hive
Golden Golden Gates NaN
Yellow NaN NaN
我希望从我的数据框中删除所有NaN细胞。所以最后,它看起来像这样,'Yellow Bee Hive'已移至第1行(类似于从excel中的列中删除单元格时发生的情况):
Word Word2 Word3
1 Hello My Name Yellow Bee Hive
2 My Yellow Bee
3 Yellow Golden Gates
4 Golden
5 Yellow
不幸的是,这些都不起作用,因为他们删除了整条行!
df = df[pd.notnull(df['Word','Word2','Word3'])]
或
df = df.dropna()
有人有什么建议吗?我应该重新索引表吗?
答案 0 :(得分:3)
import numpy as np
import pandas as pd
import functools
def drop_and_roll(col, na_position='last', fillvalue=np.nan):
result = np.full(len(col), fillvalue, dtype=col.dtype)
mask = col.notnull()
N = mask.sum()
if na_position == 'last':
result[:N] = col.loc[mask]
elif na_position == 'first':
result[-N:] = col.loc[mask]
else:
raise ValueError('na_position {!r} unrecognized'.format(na_position))
return result
df = pd.read_table('data', sep='\s{2,}')
print(df.apply(functools.partial(drop_and_roll, fillvalue='')))
产量
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow
答案 1 :(得分:1)
由于您希望值向上移动,因此您必须创建新的数据框
开始 -
Word Word2
0 Hello NaN
1 My My Name
2 Yellow Yellow Bee
3 Golden Golden Gates
4 Yellow NaN
使用以下方法 -
def get_column_array(df, column):
expected_length = len(df)
current_array = df[column].dropna().values
if len(current_array) < expected_length:
current_array = np.append(current_array, [''] * (expected_length - len(current_array)))
return current_array
pd.DataFrame({column: get_column_array(df, column) for column in df.columns}
给予 -
Word Word2
0 Hello My Name
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow
您也可以使用相同的功能编辑现有的df -
for column in df.columns:
df[column] = get_column_array(df, column)
答案 2 :(得分:1)
我认为您可以使用此
df = df.apply(lambda x: pd.Series(x.dropna().values))
例如:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Word':['Hello', 'My', 'Yellow', 'Golden', 'Yellow'],
'Word2':[np.nan, 'My Name', 'Yellow Bee', 'Golden Gates', np.nan],
'Word3':[np.nan, np.nan, 'Yellow Bee Hive', np.nan, np.nan]
})
print(df)
初始数据框:
Word Word2 Word3
0 Hello NaN NaN
1 My My Name NaN
2 Yellow Yellow Bee Yellow Bee Hive
3 Golden Golden Gates NaN
4 Yellow NaN NaN
并应用此lambda函数:
df = df.apply(lambda x: pd.Series(x.dropna().values))
print(df)
给予:
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee NaN
2 Yellow Golden Gates NaN
3 Golden NaN NaN
4 Yellow NaN NaN
然后,您可以用空字符串填充NaN值:
df = df.fillna('')
print(df)
Word Word2 Word3
0 Hello My Name Yellow Bee Hive
1 My Yellow Bee
2 Yellow Golden Gates
3 Golden
4 Yellow