说我有一个名称不同的dataFrame,有些带有2个单词名称,有些带有1个单词名称:
Team A
1 Zeus Odin John Wick Jason Bourne Loki
2
我想得到
的结果Team A Hero 1 Team A Hero 2 Team A Hero 3 Team A Hero 4 Team A Hero 5
Zeus Odin John Wick Jason Bourne Loki
我该如何在正则表达式中使用pandas str.split()功能?
答案 0 :(得分:1)
一种方法可能是将包含空格的英雄名称临时替换为不带空格的名称,并在使用您要使用的str.split()
函数之后反转
import re
# create dictionary to assign the name of the hero with space to the one without
dict_hero = { hero: hero.replace(' ','') for hero in HeroList if ' ' in hero}
# create the inverse of the previous dictionary, several ways but I choose this one
dict_hero_rev = { hero.replace(' ',''):hero for hero in HeroList if ' ' in hero}
# now create the pattern and the replacement function to use in str.replace
pat = re.compile('|'.join(dict_hero.keys())) #look for the hero's name in your dict_heor keys
repl = lambda x: dict_hero[x.group()] # replace by the corresponding name in the dict_hero
# work on the column Team A
(df['Team A'].str.replace(pat, repl) #change the one with space to without
.str.split(' ', expand=True) # split on whitespace and expand to columns
.replace(dict_hero_rev) # replace the hero's names missing a space by the name with space
.rename(columns={nb: 'Team A Hero {}'.format(nb+1) for nb in range(5)}))
具有类似数据框
df = pd.DataFrame({'Team A':['Zeus Odin John Wick Jason Bourne Loki',
'Hulk Thor Green Lantern Batman Captain America']})
Team A
0 Zeus Odin John Wick Jason Bourne Loki
1 Hulk Thor Green Lantern Batman Captain America
和英雄列表
HeroList = ['Green Lantern', 'Thor', 'Hulk', 'Odin', 'Batman',
'Jason Bourne', 'Loki', 'John Wick', 'Zeus', 'Captain America']
然后上述方法为您提供
Team A Hero 1 Team A Hero 2 Team A Hero 3 Team A Hero 4 Team A Hero 5
0 Zeus Odin John Wick Jason Bourne Loki
1 Hulk Thor Green Lantern Batman Captain America