我为每个状态(只有3个不同的状态)创建了一个新列,每行中都有空值。我使用for循环遍历原始“状态”列,如果要满足我想要的状态的条件,则将“ 1”的值放在“纽约”列的相应行中,例如
ifstream pathfile(p.string());
cout << "Path file opened successfully.\n\n";
string line;
stringstream ss;
int x, y;
char comma,direction;
//Iterate all lines in the file
while(getline(pathfile,line)){
//remove all spaces from line
line.erase(remove(line.begin(), line.end(), ' '), line.end());
//skip comments and blank lines
if(line.c_str()[0] == '#' || line.empty()) continue;
//parse remaining lines
ss.str(string()); //clear stringstream
cout <<"LINE: "<<line<<endl;
ss << line;
cout <<"SS: "<<ss.str()<<endl;
if(ss >> x >> comma >> y >> comma >> direction)
cout << "X: "<<x<<" Y: "<<y<<" D: "<<direction;
else{
cout << "Ill-formatted line: ";
}
printf(" | %s\n\n", line.c_str());
}
pathfile.close();
我希望在“状态”列中相应位置的“纽约”列中看到1,其中值是“纽约”,但返回的所有结果都是0s
答案 0 :(得分:0)
好的,这可能不是计算成本的最佳解决方案,但是您只能使用iterrows
函数:
import pandas as pd
df1 = pd.DataFrame(columns=["OrginalState","State1","State2", "State3"])
df1.loc[0] = ["State1",None,None,None]
df1.loc[1] = ["State2",None,None,None]
df1.loc[2] = ["State3",None,None,None]
for index, row in df1.iterrows():
if(row["OrginalState"] == "State1"):
df1.loc[index]["State1"] = 1
if(row["OrginalState"] == "State2"):
df1.loc[index]["State2"] = 1
if(row["OrginalState"] == "State3"):
df1.loc[index]["State3"] = 1
print df1
输出:
OrginalState State1 State2 State3
0 State1 1 None None
1 State2 None 1 None
2 State3 None None 1
答案 1 :(得分:0)
好像您想使用一键编码。有多种方法可以做到这一点:
使用pd.get_dummies :
one_hot_df = pd.get_dummies(orig_df['States'])
要将其与原始数据框结合在一起:
orig_df.join(one_hot_df)
使用sklearn中的OneHotEncoder :
如果将来可能有新数据要编码(例如,当您要对测试数据集进行编码时),sklearn.preprocessing.OneHotEncoder也将派上用场。