我需要比较两个数据帧中的代码。我正在使用Python 3和pandas
在第一个基地,代码总是有18位数字:
dividas_dep = pd.read_csv("dividas_deputados_ajustado_csv.csv",sep=';',encoding = 'latin_1')
dividas_dep.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 106 entries, 0 to 105
Data columns (total 10 columns):
CPF_Deputado 106 non-null object
CPF_limpo 106 non-null int64
Nome_Deputado 106 non-null object
Vinculo 106 non-null object
CNPJ_Devedor 106 non-null object
CNPJ_limpo 106 non-null int64
Nome_Devedor 106 non-null object
Valores_situacao_Irregular 65 non-null object
Valores_situacao_Regular 52 non-null object
Total_Devido 106 non-null object
dtypes: int64(2), object(8)
memory usage: 8.4+ KB
要在此第一个基础(“CNPJ_Devedor”)中进行比较的列具有以下示例:17.080.201 / 0001-49,76.205.723 / 0001-99,04.885.828 / 0001-25 ......
在第二个基地,代码总是有10位数字:
funrural = pd.read_excel('DEVEDORES FUNRURAL ATUALIZADO PGFN.xlsx')
funrural.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8130 entries, 0 to 8129
Data columns (total 14 columns):
PSFN_PGFN 8129 non-null object
Regiao 8129 non-null object
CNPJ_CEI_Tipo 8129 non-null object
CNPJ_Raiz 8129 non-null object
Razao_Social 8130 non-null object
Valor_principal 8130 non-null float64
Valor_TR_IPC_Poup 8130 non-null float64
Valor_Juros 8130 non-null float64
Valor_SELIC 8130 non-null float64
Valor_Encargo 8130 non-null float64
Valor_Multa_Oficio 8130 non-null float64
Valor_Selic_M_Oficio 8130 non-null float64
Vl_Multa_Mora 8130 non-null float64
Vl_Tot_Credito 8130 non-null float64
dtypes: float64(9), object(5)
memory usage: 889.3+ KB
要在此第二个基础(“CNPJ_Raiz”)中进行比较的列具有以下示例:04.244.173,05.006.407,03.632.132 ......
代码“CNPJ_Devedor”和“CNPJ_Raiz”在税法中有关,但我不能像这样进行简单的合并:
compara1 = pd.merge(dividas_dep, funrural, left_on='CNPJ_Devedor', right_on='CNPJ_Raiz')
我需要做的是只比较“CNPJ_Devedor”的前10位数字和代码“CNPJ_Raiz”(例如,在“17.080.201 / 0001-49”中仅使用“17.080.201”)
有没有办法在Python中执行此操作?或者我应该编辑原始数据框文件dividas_dep(dividas_deputados_ajustado_csv.csv),以创建只有前10位的新列?
答案 0 :(得分:0)
您可以将前10个字符串元素的切片与.str.slice(None, 10)
进行比较:
dividas_dep["CNPJ_Devedor"].str.slice(None, 10) == funrural["CNPJ_Raiz"]
示例:
>>> dividas_dep = pd.DataFrame({"CNPJ_Devedor": ['17.080.201/0001-49', '76.205.723/0001-99', '04.885.828/0001-25']})
>>> funrural = pd.DataFrame({"CNPJ_Raiz": ['17.080.201', '04.244.173', '05.006.407']})
>>> dividas_dep["CNPJ_Devedor"].str.slice(None, 10) == funrural["CNPJ_Raiz"]
0 True
1 False
2 False
dtype: bool
您可以使用结果创建新的数据框:
res = dividas_dep["CNPJ_Devedor"].str.slice(None, 10) == funrural["CNPJ_Raiz"]
funrural[res]