Question

我为每个状态（只有3个不同的状态）创建了一个新列，每行中都有空值。我使用for循环遍历原始“状态”列，如果要满足我想要的状态的条件，则将“ 1”的值放在“纽约”列的相应行中，例如

ifstream pathfile(p.string());
cout << "Path file opened successfully.\n\n";

string line;
stringstream ss;
int x, y;
char comma,direction;

//Iterate all lines in the file
while(getline(pathfile,line)){
  //remove all spaces from line
  line.erase(remove(line.begin(), line.end(), ' '), line.end());
  //skip comments and blank lines
  if(line.c_str()[0] == '#' || line.empty()) continue;
  //parse remaining lines
  ss.str(string()); //clear stringstream
  cout <<"LINE: "<<line<<endl;
  ss << line;
  cout <<"SS: "<<ss.str()<<endl;

  if(ss >> x >> comma >> y >> comma >> direction)
    cout << "X: "<<x<<"  Y: "<<y<<"  D: "<<direction;
  else{
    cout << "Ill-formatted line: ";
  }
  printf(" |  %s\n\n", line.c_str());
}
pathfile.close();

我希望在“状态”列中相应位置的“纽约”列中看到1，其中值是“纽约”，但返回的所有结果都是0s

Answer 1

好的，这可能不是计算成本的最佳解决方案，但是您只能使用iterrows函数：

import pandas as pd


df1 = pd.DataFrame(columns=["OrginalState","State1","State2", "State3"])

df1.loc[0] = ["State1",None,None,None]
df1.loc[1] = ["State2",None,None,None]
df1.loc[2] = ["State3",None,None,None]

for index, row in df1.iterrows():
    if(row["OrginalState"] == "State1"):
        df1.loc[index]["State1"] = 1
    if(row["OrginalState"] == "State2"):
        df1.loc[index]["State2"] = 1
    if(row["OrginalState"] == "State3"):
        df1.loc[index]["State3"] = 1

print df1

输出：

  OrginalState State1 State2 State3
0       State1      1   None   None
1       State2   None      1   None
2       State3   None   None      1

Answer 2

好像您想使用一键编码。有多种方法可以做到这一点：

使用pd.get_dummies ：

one_hot_df = pd.get_dummies(orig_df['States'])

要将其与原始数据框结合在一起：

orig_df.join(one_hot_df)

使用sklearn中的OneHotEncoder ：

如果将来可能有新数据要编码（例如，当您要对测试数据集进行编码时），sklearn.preprocessing.OneHotEncoder也将派上用场。

如何从条件for循环中将数字附加到空的dataframe列

2 个答案: