我有如下所示的训练数据( df )...
from io import StringIO
import pandas as pd
myst="""india, 905034 , 19:44
USA, 905094 , 19:33
Russia, 905154 , 21:56
"""
u_cols=['country', 'index', 'current_tm']
myf = StringIO(myst)
import pandas as pd
df = pd.read_csv(StringIO(myst), sep=',', names = u_cols)
我得到了测试数据( df1 ),但是列与原始训练集不匹配。
myst1="""india, 123455 , 19:44
USA, 233455 , 19:33
Russia, 5666432 , 21:56
"""
u_cols1=['country', 'index', 'dummy_col']
df1 = pd.read_csv(StringIO(myst1), sep=',', names = u_cols1)
是否可以重新索引新数据以使其与原始结构匹配,以使最终数据帧看起来像这样( df2 )?
myst2="""india, 123455 , NULL
USA, 233455 , NULL
Russia, 5666432 , NULL
"""
u_cols2=['country', 'index', 'current_tm']
df2 = pd.read_csv(StringIO(myst2), sep=',', names = u_cols2)
答案 0 :(得分:1)
在训练数据列中使用reindex
:
df3 = df1.reindex(columns=df.columns)
print (df3)
country index current_tm
0 india 123455 NaN
1 USA 233455 NaN
2 Russia 5666432 NaN