如何匹配2个数据帧的键并使用匹配键创建新的df?

时间:2019-04-29 07:32:26

标签: python-3.x pandas

我有2个DataFrame,其键如下所示:

df:

Index(['artistName', 'artForm/nameOfArt/practicedSkill', 'state', 'district',
       'village', 'pinCode', 'dob/yearOfBirth/date', 'gender', 'phone',
       'email', 'differentlyAbled', 'languages', 'exp', 'artAcademy',
       'category/SC/ST/OBC/General', 'scheme'],
      dtype='object')

df1:

Index(['Name of Practiced art or skill', 'Name Of Artist', ' gender',
       'District', 'Phoe No', 'Artisan card-(Y/N)', 'Card No',
       'Date/Year Of Birth/age', 'Age', 'Year of Birth', 'Level of Education',
       'Children go to School', 'Languages Known', 'SC/ST/OBC/General',
       'Sub Caste', 'Religion', 'Indigenous Religion (I)',
       'Any Migration For work', 'Received training?Y/N',
       'Details Of Training?', 'Name and details of Guru/trainer/Agency',
       'Specialised in/What part of the total process is done',
       'Are your Instrument/costume in good shape?/Are  tools in good shape?',
       'How long do you Practice per day/How long  work is done (Hrs.)',
       'What training do you need?/Training Requirements/any diversification needed ',
       'Award /Recognition', 'Is art Primary or Secondary Livelihood?',
       'If art form is secondary what is primary livelihood?Income from Pr.Livelihood',
       'Income from Performance\n(Monthly/Yearly)',
       'Bank Account (Yes/NO) details if yes',
       'Whether any bank loan received for art',
       'Name of group if any/Member  Society/Group/Cooperative', 'APL/BPL',
       'Card No.1', 'Health insurance card (has= 1 or not =2)', 'hiCard No',
       'Mgnrega/Job card (has= 1 or not =2)', 'jCard No',
       'Toilet: (has= 1 or not =2)', 'Electriciti:(has= 1 or not =2)',
       'House:Kuchcha/Pucca', 'Roof', 'Radio', 'TV', 'Cycle', 'Bike',
       'Do you have land for sabai Cultivation/ Madur kathi Cultivation',
       'Do you/family have any land:', 'Unit', 'Qty', 'Observation'],
      dtype='object')

我想匹配键并从df1中选择所需的列以创建新的df。

到目前为止,我的代码(无法正常工作)

import pandas as pd
from difflib import get_close_matches
df = pd.DataFrame(columns = ['artistName', 'artForm/nameOfArt/practicedSkill', 'state', 'district', 'village', 'pinCode', 'dob/yearOfBirth/date', 'gender', 'phone', 'email', 'differentlyAbled', 'languages', 'exp', 'artAcademy', 'category/SC/ST/OBC/General', 'scheme'])
df1 = pd.read_excel("C:\\Users\\Desktop\\Culture\\Madur.xlsx")
df.apply(lambda x: x.astype(str).str.lower())
df1.apply(lambda x: x.astype(str).str.lower())
df2 = pd.DataFrame()
finalList = []
for r in df.keys():
    seq = get_close_matches(r, df1.keys(), n=1, cutoff = .50)
    if len(seq) != 0:
        finalList.append(seq) #to get the final list of columns
print(finalList)

seq = get_close_matches(r, df1.keys(), n=1, cutoff = .50)的输出为:

artistName ['Name Of Artist']
artForm/nameOfArt/practicedSkill ['Name of Practiced art or skill']
state []
district ['District']
village []
pinCode []
dob/yearOfBirth/date ['Date/Year Of Birth/age']
gender [' gender']
phone ['Phoe No']
email []
differentlyAbled []
languages ['Languages Known']
exp []
artAcademy []
category/SC/ST/OBC/General ['SC/ST/OBC/General']
scheme []

我要从seq变量中选择列并进行df处理。像这样,有几个文件需要与df进行比较。

我能够从df1中提取需要选择的列,但是我该怎么做呢? “ finalList”包含需要从df1中选取的列的列表。

有帮助吗?

2 个答案:

答案 0 :(得分:0)

如果我理解正确的问题,那么它应该起作用:

  df_cols = df.columns
  df1_cols = df_1.columns

  new_col = []
  for col in df_cols:
      if col in df1_cols:
           new_col.append(col)

 df_new = df[new_col]

答案 1 :(得分:0)

您就快到了-Pandas数据框可以毫无问题地接受列名列表:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
  char **arr;
  arr=malloc(2*sizeof(char*));
  *arr=malloc(20*sizeof(char));
  strcpy(*arr,"Hello World");
  int len=strlen((*arr+0));
  printf("%s %d",*arr,len); //output Hello World 11
  return 0;
}