如何在pandas中的单个数据列中应用set和ignorecase

时间:2017-11-04 15:11:51

标签: python pandas dataframe data-analysis

我有一个df,

 Keys        
 one, ONE    
 ram, Ram
 kumar
 Raj,rAj
 cricket
 level,LeVel
 kum,num

首先我想在df [" Keys"]上应用set和ignore case,使其成为单个值并实现

 df
Name
one
ram
kumar
raj
cricket
level
kum,num

第二次操作,

我有一个列表和我上面的DataFrame,df [" name"]

 my_list=["ONE","Ram","CRICKEt","KUm"]

我需要比较df["name"].str.lower.split(,) with my_list.lower()

如果my_list中存在一个值,那么我们需要更改df [" Name"]

我想要的输出是,

 df,
 name
 ONE
 Ram
 kumar
 raj
 CRICKEt
 level
 KUm,num

提前致谢

1 个答案:

答案 0 :(得分:1)

使用str.lower + split + apply + join

df['Name'] = df['Keys'].str.lower().str.split(',').apply(set).str.join(',')
print (df)
          Keys     Name
0      one,ONE      one
1      ram,Ram      ram
2        kumar    kumar
3      Raj,rAj      raj
4      cricket  cricket
5  level,LeVel    level
6      kum,num  num,kum

如果可以在,空格之后使用,\s*作为分隔符 - 逗号+零或更多:

df['Name'] = df['Keys'].str.lower().str.split(',\s*').apply(set).str.join(',')
print (df)
          Keys     Name
0     one, ONE      one
1     ram, Ram      ram
2        kumar    kumar
3      Raj,rAj      raj
4      cricket  cricket
5  level,LeVel    level
6      kum,num  num,kum

编辑:

最后创建字典,然后替换:

my_list=["ONE","Ram","CRICKEt","KUm"]
d = dict(zip([x.lower() for x in my_list],my_list))
print (d)
{'cricket': 'CRICKEt', 'one': 'ONE', 'ram': 'Ram', 'kum': 'KUm'}

splitted = df['Keys'].str.lower().str.split(',').apply(set)
df['Name'] = splitted.str.join(',').replace(d, regex=True)
df['Count'] = splitted.str.len()
print (df)
          Keys     Name  Count
0      one,ONE      ONE      1
1      ram,Ram      Ram      1
2        kumar    KUmar      1
3      Raj,rAj      raj      1
4      cricket  CRICKEt      1
5  level,LeVel    level      1
6      kum,num  num,KUm      2