假设我有以下DataFrame:
>>> df
val1 val2 val3
key
1 1 1 1
2 2 2 2
3 3 3 3
现在我要选择val1
,val2
列,以及(这里是踢球者:) val4
>>> df[["val1", "val2", "val4"]]
KeyError: "['val4'] not in index"
我想要的是什么:
>>> df.something(something)
val1 val2 val4
key
1 1 1 NaN
2 2 2 NaN
3 3 3 NaN
答案 0 :(得分:3)
IIUC reindex
df.reindex(columns=["val1", "val2", "val4"])
Out[431]:
val1 val2 val4
key
1 1 1 NaN
2 2 2 NaN
3 3 3 NaN
同样.loc
可以做到这一点,但会发出警告:将列表喜欢传递给.loc或[]以及任何缺少的标签将来会引发KeyError,您可以使用.reindex()作为替代
df.loc[:,["val1", "val2", "val4"]]
答案 1 :(得分:0)
这样的事情应该让你开始:
import pandas as pd
import numpy as np
df = pd.DataFrame([[1, 1, 1], [2, 2, 2], [3, 3, 3]], columns=['val1', 'val2', 'val3'])
def check_columns(df, values):
temp = pd.DataFrame()
for i in values:
try:
temp[i] = df[i]
except:
temp[i] = np.nan
return temp
print(check_columns(df, ['val1', 'val2', 'val3']))
print(check_columns(df, ['val1', 'val2', 'val4']))
给出:
val1 val2 val3
0 1 1 1
1 2 2 2
2 3 3 3
val1 val2 val4
0 1 1 NaN
1 2 2 NaN
2 3 3 NaN