根据正则表达式参数将Dataframe列分成多个列

时间:2018-10-14 12:03:44

标签: python-3.x pandas web-scraping

熊猫新手在这里。删除每个团队记录并将其放入新列的最佳方法是什么?预先感谢!

    Rank    Team    
0   1       LA Rams (5-0)   
1   2       New Orleans (4-1)   
2   3       New England (3-2)   
3   4       Kansas City (5-0)   
4   5       Pittsburgh (2-2-1)  
5   6       Baltimore (3-2) 

2 个答案:

答案 0 :(得分:0)

有趣的问题。

不幸的是,Series.str.extract会很容易地获取记录,但不会删除它(使用朴素的正则表达式,以防团队使用(...)来命名更复杂的记录) :

df['Record'] = df['Team'].str.extract('(\(.*?\))')
print(df)
#    Rank                Team   record
#  0    1       LA Rams (5-0)    (5-0)
#  1    2   New Orleans (4-1)    (4-1)
#  2    3   New England (3-2)    (3-2)
#  3    4   Kansas City (5-0)    (5-0)
#  4    5  Pittsburgh (2-2-1)  (2-2-1)
#  5    6     Baltimore (3-2)    (3-2)

这将需要实现我们自己的功能:

import re

record_regex = re.compile(r'(\(.*?\))')

records = []

def extract_and_remove_record(x):
    record = record_regex.findall(x)[0]
    records.append(record)
    return record_regex.sub('', x)

df['Team'] = df['Team'].apply(extract_and_remove_record)
df['Record'] = records

print(df)
#    Rank          Team  Records
#  0    1      LA Rams     (5-0)
#  1    2  New Orleans     (4-1)
#  2    3  New England     (3-2)
#  3    4  Kansas City     (5-0)
#  4    5   Pittsburgh   (2-2-1)
#  5    6    Baltimore     (3-2)

答案 1 :(得分:0)

另一种不涉及正则表达式技巧的方法。

df[['Team Name', 'Team Records']] = d.Team.apply(lambda x: pd.Series(x.rstrip(')').split(' (')))
df.drop('Team', axis=1, inplace=True)