Python - Pandas:从列中提取一个数字到新列

时间:2017-06-09 17:58:44

标签: python pandas

我在python中经常使用pandas来提取信息。我的数据框的一列中有以下标题:

   0
In & Out (1997)
Simple Plan, A (1998)
Retro Puppetmaster (1999)
Paralyzing Fear: The Story of Polio in America, A (1998)
Old Man and the Sea, The (1958)
Body Shots (1999)
Coogan's Bluff (1968)
Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)
Search for One-eye Jimmy, The (1996)
Funhouse, The (1981)

我想花几年的时间来填写新专栏。我遇到的问题是,如果我进行拆分('作为分隔符,正如您在第8行看到的那样,它会在那里拆分。那么我该如何拆分(yyyy)在那一年形成一个新列,看起来像这样吗?

     0                 1
In & Out              1997
Simple Plan, A        1998
Retro Puppetmaster    1999 
Paralyzing Fear:...   1998
Old Man and the S...  1958
Body Shots            1999
Coogan's Bluff        1968 
Seven Samurai (T...   1954
Search for One-ey...  1996
Funhouse, The         1981

2 个答案:

答案 0 :(得分:1)

您可以使用展开:

df['year'] = df.iloc[:,0].str.extract('\((\d{4})\)'',expand=False)

df
Out[381]: 
                                                   0  year
0                                    In & Out (1997)  1997
1                              Simple Plan, A (1998)  1998
2                          Retro Puppetmaster (1999)  1999
3  Paralyzing Fear: The Story of Polio in America...  1998
4                    Old Man and the Sea, The (1958)  1958
5                                  Body Shots (1999)  1999
6                              Coogan's Bluff (1968)  1968
7  Seven Samurai (The Magnificent Seven) (Shichin...  1954
8               Search for One-eye Jimmy, The (1996)  1996
9                               Funhouse, The (1981)  1981

答案 1 :(得分:0)

您可以尝试字符串切片操作。 字符串数据类型的rindex()方法返回匹配模式的索引值(在这​​种情况下,它是从右端角开始的'(')。使用索引值,我们可以按预期执行字符串切片。

例如:

>>> a = "Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)"
>>>
>>> print a[:a.rindex('(')], a[a.rindex('(')+1:-1]

Seven Samurai (The Magnificent Seven) (Shichinin no samurai)  1954    
>>>
>>>