匹配列名称存储在另一个数据框中,并替换为其ID

时间:2020-08-24 05:50:46

标签: python pandas data-manipulation

我有一个称为Master的主数据框,其中包含所有问题ID。 我有多个包含这些问题的数据集作为标题,我想用其ID替换这些标题。

主表如下:

Question               ID

gender                 1
sex                    1
what is your gender    1
sexual orientation     1
marital status         2
occupation             3
whats you job          3

df1看起来像这样:

gender         marital status  occupation

Male           Single          Doctor
Male           Divorced        Engineer

所需的输出

   1            2                 3                 

   Male        Single            Doctor
   Male        Divorced          Engineer

如果df1中有任何新变量在主数据表中没有提及ID,则应为其赋予新ID,变量名称和ID将在主表中更新

例如。

df2看起来像这样:

gender         marital status  country

Male           Single          India
Male           Divorced        UK

所需的df2:

1                 2              4

Male           Single          India
Male           Divorced        UK

更新后的主表将为:

Question               ID

gender                 1
sex                    1
what is your gender    1
sexual orientation     1
marital status         2
occupation             3
whats you job          3
country                4

2 个答案:

答案 0 :(得分:2)

使用innodb_force_recovery > 0的{​​{3}}通过其他数据设置新的列名称:

function split(val) {
  return val.split(/,\s*/);
}

function extractLast(term) {
  return split(term).pop();
}

var availableTags = [
  "[Hello]",
  "[Hello World]",
  "[Google",
  "[New Life]",
  "[World]",
  "[Old]"
];
$("#tags").autocomplete({
  source: function(request, response) {
    // delegate back to autocomplete, but extract the last term
    response($.ui.autocomplete.filter(
      availableTags, extractLast(request.term)));
  },
  select: function(event, ui) {
    var terms = split(this.value);
    // remove the current input
    terms.pop();
    // add the selected item
    terms.push(ui.item.value);
    // add placeholder to get the comma-and-space at the end
    terms.push("");
    this.value = terms.join("  ");
    return false;
  }
});

编辑:

<script type="text/javascript" src="//code.jquery.com/jquery-1.9.1.js" jq=""></script> <script type="text/javascript" src="//code.jquery.com/ui/1.9.2/jquery-ui.js"></script> <link rel="stylesheet" type="text/css" href="//code.jquery.com/ui/1.9.2/themes/base/jquery-ui.css"> <div class="ui-widget"> <label for="tags">Search: </label> <input type="text" id="tags" onkeypress="edValueKeyPress()" /> </div>的{​​{1}}值中有重复项,因此需要创建唯一的Series值。一种可能的解决方案是通过DataFrame.rename删除重复项,以下是示例数据,了解其工作方式:

df2 = df1.rename(columns=df.set_index('Question')['ID'])
print (df2)
      1         2         3
0  Male    Single    Doctor
1  Male  Divorced  Engineer

您可以测试真实数据中的重复项:

Question

删除重复项,并保留第一行重复项,这里df

Question

删除重复项,并保留第一行重复项,这里print (df) Question ID 0 gender 10 <-duplicates, change ID for test 1 gender 15 <-duplicates, change ID for test 2 what is your gender 1 3 sexual orientation 1 4 marital status 2 5 occupation 3 6 whats you job 3

print (df[df.duplicated('Question', keep=False)])
  Question  ID
0   gender  10
1   gender  15

EDIT1:如果主DataFrame中的值不存在并且有必要先附加它们,则使用:

ID=10

获取print (df.drop_duplicates('Question').set_index('Question')['ID']) Question gender 10 what is your gender 1 sexual orientation 1 marital status 2 occupation 3 whats you job 3 Name: ID, dtype: int64 df21 = df1.rename(columns=df.drop_duplicates('Question').set_index('Question')['ID']) print (df21) 10 2 3 0 Male Single Doctor 1 Male Divorced Engineer 中不存在的所有列:

ID=15

在最大值后加print (df.drop_duplicates('Question', keep='last').set_index('Question')['ID']) Question gender 15 what is your gender 1 sexual orientation 1 marital status 2 occupation 3 whats you job 3 Name: ID, dtype: int64 df22 = df1.rename(columns=df.drop_duplicates('Question', keep='last').set_index('Question')['ID']) print (df22) 15 2 3 0 Male Single Doctor 1 Male Divorced Engineer print (df.set_index('Question')['ID'].to_dict()) {'gender': 15, 'what is your gender': 1, 'sexual orientation': 1, 'marital status': 2, 'occupation': 3, 'whats you job': 3} df22 = df1.rename(columns=df.set_index('Question')['ID'].to_dict()) print (df22) 15 2 3 0 Male Single Doctor 1 Male Divorced Engineer

print (df)
              Question  ID
0               gender   1
1                  sex   1
2  what is your gender   1
3   sexual orientation   1
4       marital status   2
5           occupation   3
6        whats you job   3

print (df1) 
  gender marital status country  code1  code2
0   Male         Single   India      4      7
1   Male       Divorced      UK      3      5

附加到原始df['Question']

cols = df1.columns.difference(df['Question'].tolist(), sort=False)
print (cols)
Index(['country', 'code1', 'code2'], dtype='object')

最后使用原始解决方案:

ID

答案 1 :(得分:0)

您可以使用匹配问题的ID重命名:

# Create an arbitrary model with some weights, for example
model = Sequential(layers = [
    Dense(70, input_shape = (100,)),
    Dense(60),
    Dense(50),
    Dense(5)])

# Save the weights of the model
model.save_weights(“model.h5”)

# Later, load in the model (we only really need the layer in question)
old_model = Sequential(layers = [
    Dense(70, input_shape = (100,)),
    Dense(60),
    Dense(50),
    Dense(5)])

old_model.load_weights(“model.h5”)

# Create a new model with slightly different architecture (except for the layer in question, at least)
new_model = Sequential(layers = [
    Dense(80, input_shape = (100,)),
    Dense(60),
    Dense(50),
    Dense(5)])

# Set the weights of the final layer of the new model to the weights of the final layer of the old model, but leaving other layers unchanged.
new_model.layers[-1].set_weights(old_model.layers[-1].get_weights())

# Assert that the weights of the final layer is the same, but other are not.
print (np.all(new_model.layers[-1].get_weights()[0] == old_model.layers[-1].get_weights()[0]))
>> True

print (np.all(new_model.layers[-2].get_weights()[0] == old_model.layers[-2].get_weights()[0]))
>> False

这应该适用于给定列的多个可能名称。

相关问题