我当前的DataFrame如下:
0 1 2 3 4 5
0 NAN Name NAN Amount NAN Percentage
1 NAN A NAN 28223 NAN 8.70%
2 NAN B NAN NAN NAN NAN
3 NAN C NAN NAN NAN NAN
4 NAN D NAN 21871 NAN 6.80%
5 NAN E NAN NAN NAN NAN
6 NAN F NAN NAN NAN NAN
7 NAN G NAN 21380 NAN 6.64%
8 NAN H NAN NAN NAN NAN
9 NAN I NAN NAN NAN NAN
10 NAN J NAN 20784 NAN 6.46%
11 NAN K NAN NAN NAN NAN
12 NAN L NAN NAN NAN NAN
我想通过将一些NAN
值和一些NAN
值删除到Not Listed
来清理此DataFrame。
预期产量
Name Amount Percentage
A 28223 8.70%
B Not Listed Not Listed
C Not Listed Not Listed
D 21871 6.80%
E Not Listed Not Listed
F Not Listed Not Listed
G 21380 6.64%
H Not Listed Not Listed
I Not Listed Not Listed
J 20784 6.46%
K Not Listed Not Listed
L Not Listed Not Listed
有人可以帮我吗
答案 0 :(得分:2)
dropna
+ slicing
t = df.dropna(axis=1, how='all').values
pd.DataFrame(t[1:], columns=t[0]).fillna('Not listed')
Name Amount Percentage
0 A 28223 8.70%
1 B Not listed Not listed
2 C Not listed Not listed
3 D 21871 6.80%
4 E Not listed Not listed
5 F Not listed Not listed
6 G 21380 6.64%
7 H Not listed Not listed
8 I Not listed Not listed
9 J 20784 6.46%
10 K Not listed Not listed
11 L Not listed Not listed
答案 1 :(得分:1)
您可以使用df.drop([column_names],axis=1)
删除不需要的列,并用NAN
替换not listed
,然后设置列标题并删除为标题花费的多余行。
df =df.drop([0,2,4],axis=1).replace(np.nan, 'Not Listed')
df.columns = df.iloc[0]
df.drop(0,inplace=True)
答案 2 :(得分:1)
给出您特定的数据结构:
df.columns = df.iloc[0, :] # Rename the columns based on the first row of data.
df.columns.name = None # Set the columns name to None.
df = df.iloc[1:, :].reset_index(drop=True) # Drop the column names from the data in the dataframe.
>>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed')
Name Amount Percentage
0 A 28223 8.70%
1 B Not Listed Not Listed
2 C Not Listed Not Listed
3 D 21871 6.80%
4 E Not Listed Not Listed
5 F Not Listed Not Listed
6 G 21380 6.64%
7 H Not Listed Not Listed
8 I Not Listed Not Listed
9 J 20784 6.46%
10 K Not Listed Not Listed
11 L Not Listed Not Listed
如果需要,可以通过更改代码的最后一行来将索引设置为名称:
>>> >>> df.replace('NAN', np.nan).dropna(how='all', axis=1).replace(np.nan, 'Not Listed').set_index('Name')
Amount Percentage
Name
A 28223 8.70%
B Not Listed Not Listed
C Not Listed Not Listed
D 21871 6.80%
E Not Listed Not Listed
F Not Listed Not Listed
G 21380 6.64%
H Not Listed Not Listed
I Not Listed Not Listed
J 20784 6.46%
K Not Listed Not Listed
L Not Listed Not Listed