我需要从excel插入数据,然后使用vc_no列将其匹配到另一个excel数据,并将Type(Secondary,Primary)作为我的键,将数据放入相应的列。这是一个例子:
secondary pairs primary pair
vc_no stat vc_no1 c_result1 vc_no2 c_result2
472594 NULL
264781 NULL
974621 NULL
231412 NULL
314283 NULL
NULL NULL
NULL NULL
我想要的是获取sourcefile
的值,并根据block type
和vc_no
将其插入各自的列。
Block Type vc_no c_result
Primary n/a not match
Primary n/a match
Primary 472594 match
Primary 974621 match
Primary 231412 not match
Secondary 314283 match
Secondary 264781 match
Secondary 974621 match
secondary-pairs primary-pairs
vc_no stat vc_no1 c_result1 vc_no2 c_result2
472594 NULL NULL NULL 472594 match
264781 NULL 264781 match
974621 NULL 974621 match
231412 NULL 231412 not match
314283 NULL 314283 match
n/a not match
NULL NULL n/a match
NULL NULL n/a not match
我尝试通过pandas将我的数据通过vc_no与isin()匹配,并使用.str.contains()获取块类型值,并使用.columns []将它们放到列中并且工作正常。
我需要获取其vc_no
block type
及其comp_res
,然后将其与来自其他数据框的现有vc_no, block type and comp_res
匹配。但我得到的只是vc_no
到指定匹配列的值。 note:I am writing it to a new file.
import pandas as pd
df_1 = pd.read_excel("firstexcelfile.xlsx")
df_2 = pd.read_excel("sourcefile.xlsx", "v0.02")
vc_Secondary = df_1.columns[16]
adsl_old = df_1.columns[36]
df_1 = pd.DataFrame(df_1)
df_2 = pd.DataFrame(df_2)
Primary = df_2['Block Type'].str.contains('Primary')
Secondary = df_2['Block Type'].str.contains('Secondary')
df_2[Primary].to_excel("Primary.xlsx")
df_2[Secondary].to_excel("Secondary.xlsx")
File = pd.read_excel("firstexcelfile.xlsx")
secFile = pd.read_excel("Primary.xlsx")
secID = secFile.columns[13]
ads = File.columns[39]
df_1 = df_1[df_1['vc_no'].isin(secFile[secID])]
df_1[vc_Secondary] = df_1['vc_no']
df_1[ads] = df_2[['Block Name', 'Pair']].apply(lambda x: '/'.join(x.astype(str)), axis=1)
df_1 = df_1[df_1['vc_no'].isin(File[adsl])]
df_1[ads_old] = df_1['ads']
df_1.to_excel('util_CAB_sample.xlsx')
答案 0 :(得分:1)
有几种方法是可能的,这里有一个例子,如果我理解你的问题。我在df_first
和df_source
中只使用了必要的列创建了我的输入,但如果在阅读excel时获得其他列,则通常没有问题。
import pandas as pd
# Create both DF with used data
df_first = pd.DataFrame({'vc_no':[472594, 264781, 974621, 231412, 314283]})
df_source = pd.DataFrame({'Block Type': ['Primary','Primary','Primary', 'Secondary', 'Secondary', 'Secondary'],
'vc_no':[472594, 974621, 231412, 314283, 264781, 974621],
'c_result':['match','match','not match','match','match','match']})
# Select data you want to add to df_first from df_source where Block Type = Primary
df_prim = df_source[['vc_no','c_result']][(df_source['Block Type'] == 'Primary')]
# Then use apply() to create the two column for Block Type = primary
df_first[['c_result_primary','vc_no_primary']] = df_first['vc_no'].apply(lambda x: df_prim[df_prim['vc_no'] == x].iloc[0] if x in list(df_prim['vc_no']) else pd.Series())
#Same for Block Type = secondary
df_sec = df_source[['vc_no','c_result']][(df_source['Block Type'] == 'Secondary')]
df_first[['c_result_secondary','vc_no_secondary']] = df_first['vc_no'].apply(lambda x: df_sec[df_sec['vc_no'] == x].iloc[0] if x in list(df_sec['vc_no']) else pd.Series())
# Fill nan with empty string
df_first = df_first.fillna('')
结果就像
vc_no c_result_primary vc_no_primary c_result_secondary vc_no_secondary
0 472594 match 472594
1 264781 match 264781
2 974621 match 974621 match 974621
3 231412 not match 231412
4 314283 match 314283