如何删除pandas中的过滤数据(Data Munging)

时间:2017-04-24 15:29:56

标签: python pandas dataframe data-munging

大熊猫系列中的数据:

data = ["1. stock1 (1991)",  
"3. stock13 (1993)",  
"5. stock19 (1999)",  
"89. stock105 (2001)"] # pandas Series

我需要过滤每个字符串并保存为

s.no    sdata       year  
1       stock1      1991  
3       stock13     1993  
5       stock19     1999  
89      stock105    2001 

我尝试过使用

data = stock["Rank & Title"].str.split(".")

1 个答案:

答案 0 :(得分:1)

您可以使用 regex 尝试str.extract方法:

data = ["1. stock1 (1991)",  
"3. stock13 (1993)",  
"5. stock19 (1999)",  
"89. stock105 (2001)"]

s = pd.Series(data)

s.str.extract("(?P<sno>\d+)\.\s(?P<sdata>\w+)\s\((?P<year>\d+)\)", expand=True)

# sno      sdata    year
#0  1     stock1    1991
#1  3    stock13    1993
#2  5    stock19    1999
#3  89  stock105    2001

分解正则表达式(?P<sno>\d+)\.\s(?P<sdata>\w+)\s\((?P<year>\d+)\)可以简化为(\d+)\.\s(\w+)\s\((\d+)\),而无需命名捕获的组(使用?P<name>完成); (\d+)(\w+)(\d+)分别捕获 s.no stockname 。< / p>

或者您可能只想分割空白区域,然后根据实际数据的样子清理列:

(s.str.split(" ", expand=True)
  # strip period and parenthesis
 .apply(lambda col: col.str.strip(".()"))
  # rename columns
 .rename(columns={0: "s.no", 1: "sdata", 2: "year"}))

# s.no     sdata    year
#0   1    stock1    1991
#1   3   stock13    1993
#2   5   stock19    1999
#3  89  stock105    2001