我有2个DataFrame,其键如下所示:
df:
Index(['artistName', 'artForm/nameOfArt/practicedSkill', 'state', 'district',
'village', 'pinCode', 'dob/yearOfBirth/date', 'gender', 'phone',
'email', 'differentlyAbled', 'languages', 'exp', 'artAcademy',
'category/SC/ST/OBC/General', 'scheme'],
dtype='object')
df1:
Index(['Name of Practiced art or skill', 'Name Of Artist', ' gender',
'District', 'Phoe No', 'Artisan card-(Y/N)', 'Card No',
'Date/Year Of Birth/age', 'Age', 'Year of Birth', 'Level of Education',
'Children go to School', 'Languages Known', 'SC/ST/OBC/General',
'Sub Caste', 'Religion', 'Indigenous Religion (I)',
'Any Migration For work', 'Received training?Y/N',
'Details Of Training?', 'Name and details of Guru/trainer/Agency',
'Specialised in/What part of the total process is done',
'Are your Instrument/costume in good shape?/Are tools in good shape?',
'How long do you Practice per day/How long work is done (Hrs.)',
'What training do you need?/Training Requirements/any diversification needed ',
'Award /Recognition', 'Is art Primary or Secondary Livelihood?',
'If art form is secondary what is primary livelihood?Income from Pr.Livelihood',
'Income from Performance\n(Monthly/Yearly)',
'Bank Account (Yes/NO) details if yes',
'Whether any bank loan received for art',
'Name of group if any/Member Society/Group/Cooperative', 'APL/BPL',
'Card No.1', 'Health insurance card (has= 1 or not =2)', 'hiCard No',
'Mgnrega/Job card (has= 1 or not =2)', 'jCard No',
'Toilet: (has= 1 or not =2)', 'Electriciti:(has= 1 or not =2)',
'House:Kuchcha/Pucca', 'Roof', 'Radio', 'TV', 'Cycle', 'Bike',
'Do you have land for sabai Cultivation/ Madur kathi Cultivation',
'Do you/family have any land:', 'Unit', 'Qty', 'Observation'],
dtype='object')
我想匹配键并从df1中选择所需的列以创建新的df。
到目前为止,我的代码(无法正常工作)
import pandas as pd
from difflib import get_close_matches
df = pd.DataFrame(columns = ['artistName', 'artForm/nameOfArt/practicedSkill', 'state', 'district', 'village', 'pinCode', 'dob/yearOfBirth/date', 'gender', 'phone', 'email', 'differentlyAbled', 'languages', 'exp', 'artAcademy', 'category/SC/ST/OBC/General', 'scheme'])
df1 = pd.read_excel("C:\\Users\\Desktop\\Culture\\Madur.xlsx")
df.apply(lambda x: x.astype(str).str.lower())
df1.apply(lambda x: x.astype(str).str.lower())
df2 = pd.DataFrame()
finalList = []
for r in df.keys():
seq = get_close_matches(r, df1.keys(), n=1, cutoff = .50)
if len(seq) != 0:
finalList.append(seq) #to get the final list of columns
print(finalList)
seq = get_close_matches(r, df1.keys(), n=1, cutoff = .50)
的输出为:
artistName ['Name Of Artist']
artForm/nameOfArt/practicedSkill ['Name of Practiced art or skill']
state []
district ['District']
village []
pinCode []
dob/yearOfBirth/date ['Date/Year Of Birth/age']
gender [' gender']
phone ['Phoe No']
email []
differentlyAbled []
languages ['Languages Known']
exp []
artAcademy []
category/SC/ST/OBC/General ['SC/ST/OBC/General']
scheme []
我要从seq
变量中选择列并进行df处理。像这样,有几个文件需要与df
进行比较。
我能够从df1
中提取需要选择的列,但是我该怎么做呢? “ finalList”包含需要从df1
中选取的列的列表。
有帮助吗?
答案 0 :(得分:0)
如果我理解正确的问题,那么它应该起作用:
df_cols = df.columns
df1_cols = df_1.columns
new_col = []
for col in df_cols:
if col in df1_cols:
new_col.append(col)
df_new = df[new_col]
答案 1 :(得分:0)
您就快到了-Pandas数据框可以毫无问题地接受列名列表:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
char **arr;
arr=malloc(2*sizeof(char*));
*arr=malloc(20*sizeof(char));
strcpy(*arr,"Hello World");
int len=strlen((*arr+0));
printf("%s %d",*arr,len); //output Hello World 11
return 0;
}