实现此解决方案的任何替代方法? 当有许多键匹配时,使用str.contains()并不是很优雅。
df = DataFrame({'A':['Cat had a nap','Dog had puppies','Did you see a Donkey','kitten got angry','puppy was cute']})
dic = {'Cat':'Cat','kitten':'Cat','Dog':'Dog','puppy':'Dog'}
A
0 Cat had a nap
1 Dog had puppies
2 Did you see a Donkey
3 kitten got angry
4 puppy was cute
df['Cat'] = (df['A'].astype(str).str.contains('Cat')|df['A'].astype(str).str.contains('kitten')).replace({False:0, True:1})
df['Dog'] = (df['A'].astype(str).str.contains('Dog')|df['A'].astype(str).str.contains('puppy')).replace({False:0, True:1})
df
A Cat Dog
0 Cat had a nap 1 0
1 Dog had puppies 0 1
2 Did you see a Donkey 0 0
3 kitten got angry 1 0
4 puppy was cute 0 1
答案 0 :(得分:3)
将|
用于str.contains
中的正则表达式or
,使用强制转换布尔值来整数astype
:
df['Cat'] = df['A'].astype(str).str.contains('Cat|kitten').astype(int)
df['Dog'] = df['A'].astype(str).str.contains('Dog|puppy').astype(int)
类似:
a = df['A'].astype(str)
df['Cat'] = a.str.contains('Cat|kitten').astype(int)
df['Dog'] = a.str.contains('Dog|puppy').astype(int)
print (df)
A Cat Dog
0 Cat had a nap 1 0
1 Dog had puppies 0 1
2 Did you see a Donkey 0 0
3 kitten got angry 1 0
4 puppy was cute 0 1
使用list
s字典的更动态解决方案:
dic = {'Cat':['Cat','kitten'],'Dog':['Dog','puppy']}
for k, v in dic.items():
df[k] = df['A'].astype(str).str.contains('|'.join(v)).astype(int)
print (df)
A Cat Dog
0 Cat had a nap 1 0
1 Dog had puppies 0 1
2 Did you see a Donkey 0 0
3 kitten got angry 1 0
4 puppy was cute 0 1