如何从条件for循环中将数字附加到空的dataframe列

时间:2019-02-15 23:24:39

标签: python pandas dataframe dummy-variable

我为每个状态(只有3个不同的状态)创建了一个新列,每行中都有空值。我使用for循环遍历原始“状态”列,如果要满足我想要的状态的条件,则将“ 1”的值放在“纽约”列的相应行中,例如

ifstream pathfile(p.string());
cout << "Path file opened successfully.\n\n";

string line;
stringstream ss;
int x, y;
char comma,direction;

//Iterate all lines in the file
while(getline(pathfile,line)){
  //remove all spaces from line
  line.erase(remove(line.begin(), line.end(), ' '), line.end());
  //skip comments and blank lines
  if(line.c_str()[0] == '#' || line.empty()) continue;
  //parse remaining lines
  ss.str(string()); //clear stringstream
  cout <<"LINE: "<<line<<endl;
  ss << line;
  cout <<"SS: "<<ss.str()<<endl;

  if(ss >> x >> comma >> y >> comma >> direction)
    cout << "X: "<<x<<"  Y: "<<y<<"  D: "<<direction;
  else{
    cout << "Ill-formatted line: ";
  }
  printf(" |  %s\n\n", line.c_str());
}
pathfile.close();

我希望在“状态”列中相应位置的“纽约”列中看到1,其中值是“纽约”,但返回的所有结果都是0s

2 个答案:

答案 0 :(得分:0)

好的,这可能不是计算成本的最佳解决方案,但是您只能使用iterrows函数:

import pandas as pd


df1 = pd.DataFrame(columns=["OrginalState","State1","State2", "State3"])

df1.loc[0] = ["State1",None,None,None]
df1.loc[1] = ["State2",None,None,None]
df1.loc[2] = ["State3",None,None,None]

for index, row in df1.iterrows():
    if(row["OrginalState"] == "State1"):
        df1.loc[index]["State1"] = 1
    if(row["OrginalState"] == "State2"):
        df1.loc[index]["State2"] = 1
    if(row["OrginalState"] == "State3"):
        df1.loc[index]["State3"] = 1

print df1

输出:

  OrginalState State1 State2 State3
0       State1      1   None   None
1       State2   None      1   None
2       State3   None   None      1

答案 1 :(得分:0)

好像您想使用一键编码。有多种方法可以做到这一点:

  1. 使用pd.get_dummies

    one_hot_df = pd.get_dummies(orig_df['States'])
    

    要将其与原始数据框结合在一起:

    orig_df.join(one_hot_df)
    
  2. 使用sklearn中的OneHotEncoder

    如果将来可能有新数据要编码(例如,当您要对测试数据集进行编码时),sklearn.preprocessing.OneHotEncoder也将派上用场。