A B C D E
0 165349.20 136897.80 471784.10 New York 192261.83
1 162597.70 151377.59 443898.53 California 191792.06
2 153441.51 101145.55 407934.54 Florida 191050.39
3 144372.41 118671.85 383199.62 New York 182901.99
4 142107.34 91391.77 366168.42 Florida 166187.94
使用 df = pd.get_dummies(df,columns = ['D'])
之后 A B C E D_New York D_California D_Florida
0 165349.20 136897.80 471784.10 192261.83 0 0 1
1 162597.70 151377.59 443898.53 191792.06 1 0 0
2 153441.51 101145.55 407934.54 191050.39 0 1 0
3 144372.41 118671.85 383199.62 182901.99 0 0 1
4 142107.34 91391.77 366168.42 166187.94 0 1 0
有没有一种方法,输出看起来像没有使用df [['A','B','C','D_Califorina','D_New York','D_Florida','E']]?
A B C D_New York D_California D_Florida E
0 165349.20 136897.80 471784.10 0 0 1 192261.83
1 162597.70 151377.59 443898.53 1 0 0 191792.06
2 153441.51 101145.55 407934.54 0 1 0 191050.39
3 144372.41 118671.85 383199.62 0 0 1 182901.99
4 142107.34 91391.77 366168.42 0 1 0 166187.94
答案 0 :(得分:2)
使用sort_index
df.sort_index(axis=1)
Out[813]:
A B C D_California D_Florida D_NewYork \
0 165349.20 136897.80 471784.10 0 0 1
1 162597.70 151377.59 443898.53 1 0 0
2 153441.51 101145.55 407934.54 0 1 0
3 144372.41 118671.85 383199.62 0 0 1
4 142107.34 91391.77 366168.42 0 1 0
E
0 192261.83
1 191792.06
2 191050.39
3 182901.99
4 166187.94
编辑:.....列出sort
dict
和lambda
A=dict(zip(df.columns,list(range(0,df.shape[1]))))
#build a dict A store the order of original df
df1=pd.get_dummies(df, columns=['State'])
#get your df
youroder=list(df1)
#new disorder column name
youroder.sort(key=lambda val: A[val.split(sep='_')[0]])
# sort it
df1[youroder]
Out[842]:
R&D Spend Administration Marketing Spend State_California \
0 165349.20 136897.80 471784.10 0
1 162597.70 151377.59 443898.53 1
2 153441.51 101145.55 407934.54 0
3 144372.41 118671.85 383199.62 0
4 142107.34 91391.77 366168.42 0
State_Florida State_NewYork Profit(E)
0 0 1 192261.83
1 0 0 191792.06
2 1 0 191050.39
3 0 1 182901.99
4 1 0 166187.94
答案 1 :(得分:2)
可能不按排序顺序排列的列的通用解决方案:
找到列的位置以进行相应的dummify和concat
j = df.columns.get_loc('D')
left = df.iloc[:, :j]
dumb = pd.get_dummies(df[['D']])
rite = df.iloc[:, j+1:]
pd.concat([left, dumb, rite], axis=1)
A B C D_California D_Florida D_New York E
0 165349.20 136897.80 471784.10 0 0 1 192261.83
1 162597.70 151377.59 443898.53 1 0 0 191792.06
2 153441.51 101145.55 407934.54 0 1 0 191050.39
3 144372.41 118671.85 383199.62 0 0 1 182901.99
4 142107.34 91391.77 366168.42 0 1 0 166187.94
答案 2 :(得分:0)
不确定是否有更好的方法,但这将有效
col = ['R&D Spend', 'Administration', 'Marketing Spend', 'State_California', 'State_New York', 'State_Florida', 'Profit(E)']
df=df.loc[:, col]