我正在努力用最简单的方法在pandas中进行不区分大小写的合并。有没有办法在合并时做到这一点?我是否需要使用(?i)或带有ignorecase的正则表达式?在我的下面的代码片段中,我加入了一些国家,其中一个文件中可能是“美国”而另一个文件中的“美国”,我只想将这个案例排除在外。谢谢!
import pandas as pd
import csv
import sys
env_path = sys.argv[1]
map_path = sys.argv[2]
df_address = pd.read_csv(env_path + "\\address.csv")
df_CountryMapping = pd.read_csv(map_path + "\CountryMapping.csv")
df_merged = df_address.merge(df_CountryMapping, left_on="Country", right_on="NAME", how="left")
....
答案 0 :(得分:7)
将用于合并的两列中的值小写,然后在小写列上合并
df_address['country_lower'] = df_address['Country'].str.lower()
df_CountryMapping['name_lower'] = df_CountryMapping['NAME'].str.lower()
df_merged = df_address.merge(df_CountryMapping, left_on="country_lower", right_on="name_lower", how="left")
答案 1 :(得分:2)
df_merged = pd.merge(df_address, df_CountryMapping, left_on=df_address["Country"].str.lower(), right_on=df_CountryMapping["NAME"].str.lower(), how="left")
答案 2 :(得分:1)
我建议在阅读后删除列名
df_address.columns=[c.lower() for c in df_address.columns]
df_CountryMapping.columns=[c.lower() for c in df_CountryMapping.columns]
然后更新值
df_address['country']=df_address['country'].str.lower()
df_CountryMapping['name']=df_CountryMapping['name'].str.lower()
只有这样,才能进行合并
df_merged = df_address.merge(df_CountryMapping, left_on="country", right_on="name", how="left")
答案 3 :(得分:1)
一种解决方案是将两个数据帧的列名称全部转换为小写。所以像这样:
df_address = pd.read_csv(env_path + "\\address.csv")
df_CountryMapping = pd.read_csv(map_path + "\CountryMapping.csv")
df_address.rename(columns=lambda x: x.lower(), inplace=True)
df_CountryMapping.rename(columns=lambda x: x.lower(), inplace=True)
df_merged = df_address.merge(df_CountryMapping, left_on="Country", right_on="NAME", how="left")
答案 4 :(得分:0)
另一个选项与“ .str.casefold()”一起使用,可以更全面地合并ASCII和其他语言字符。如果您仅使用英语字母字符,则应与“ .str.lower()”相同
df_address['country_casefolded'] = df_address['Country'].str.casefold()
df_CountryMapping['name_casefolded'] = df_CountryMapping['NAME'].str.casefold()
df_merged = df_address.merge(df_CountryMapping, left_on="country_casefolded", right_on="name_casefolded", how="left")