我有一个df,
Keys
one, ONE
ram, Ram
kumar
Raj,rAj
cricket
level,LeVel
kum,num
首先我想在df [" Keys"]上应用set和ignore case,使其成为单个值并实现
df
Name
one
ram
kumar
raj
cricket
level
kum,num
第二次操作,
我有一个列表和我上面的DataFrame,df [" name"]
my_list=["ONE","Ram","CRICKEt","KUm"]
我需要比较df["name"].str.lower.split(,) with my_list.lower()
如果my_list中存在一个值,那么我们需要更改df [" Name"]
我想要的输出是,
df,
name
ONE
Ram
kumar
raj
CRICKEt
level
KUm,num
提前致谢
答案 0 :(得分:1)
使用str.lower
+ split
+ apply
+ join
:
df['Name'] = df['Keys'].str.lower().str.split(',').apply(set).str.join(',')
print (df)
Keys Name
0 one,ONE one
1 ram,Ram ram
2 kumar kumar
3 Raj,rAj raj
4 cricket cricket
5 level,LeVel level
6 kum,num num,kum
如果可以在,
空格之后使用,\s*
作为分隔符 - 逗号+零或更多:
df['Name'] = df['Keys'].str.lower().str.split(',\s*').apply(set).str.join(',')
print (df)
Keys Name
0 one, ONE one
1 ram, Ram ram
2 kumar kumar
3 Raj,rAj raj
4 cricket cricket
5 level,LeVel level
6 kum,num num,kum
编辑:
最后创建字典,然后替换:
my_list=["ONE","Ram","CRICKEt","KUm"]
d = dict(zip([x.lower() for x in my_list],my_list))
print (d)
{'cricket': 'CRICKEt', 'one': 'ONE', 'ram': 'Ram', 'kum': 'KUm'}
splitted = df['Keys'].str.lower().str.split(',').apply(set)
df['Name'] = splitted.str.join(',').replace(d, regex=True)
df['Count'] = splitted.str.len()
print (df)
Keys Name Count
0 one,ONE ONE 1
1 ram,Ram Ram 1
2 kumar KUmar 1
3 Raj,rAj raj 1
4 cricket CRICKEt 1
5 level,LeVel level 1
6 kum,num num,KUm 2