我在python中经常使用pandas来提取信息。我的数据框的一列中有以下标题:
0
In & Out (1997)
Simple Plan, A (1998)
Retro Puppetmaster (1999)
Paralyzing Fear: The Story of Polio in America, A (1998)
Old Man and the Sea, The (1958)
Body Shots (1999)
Coogan's Bluff (1968)
Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)
Search for One-eye Jimmy, The (1996)
Funhouse, The (1981)
我想花几年的时间来填写新专栏。我遇到的问题是,如果我进行拆分('作为分隔符,正如您在第8行看到的那样,它会在那里拆分。那么我该如何拆分(yyyy)在那一年形成一个新列,看起来像这样吗?
0 1
In & Out 1997
Simple Plan, A 1998
Retro Puppetmaster 1999
Paralyzing Fear:... 1998
Old Man and the S... 1958
Body Shots 1999
Coogan's Bluff 1968
Seven Samurai (T... 1954
Search for One-ey... 1996
Funhouse, The 1981
答案 0 :(得分:1)
您可以使用展开:
df['year'] = df.iloc[:,0].str.extract('\((\d{4})\)'',expand=False)
df
Out[381]:
0 year
0 In & Out (1997) 1997
1 Simple Plan, A (1998) 1998
2 Retro Puppetmaster (1999) 1999
3 Paralyzing Fear: The Story of Polio in America... 1998
4 Old Man and the Sea, The (1958) 1958
5 Body Shots (1999) 1999
6 Coogan's Bluff (1968) 1968
7 Seven Samurai (The Magnificent Seven) (Shichin... 1954
8 Search for One-eye Jimmy, The (1996) 1996
9 Funhouse, The (1981) 1981
答案 1 :(得分:0)
您可以尝试字符串切片操作。 字符串数据类型的rindex()方法返回匹配模式的索引值(在这种情况下,它是从右端角开始的'(')。使用索引值,我们可以按预期执行字符串切片。
例如:
>>> a = "Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)"
>>>
>>> print a[:a.rindex('(')], a[a.rindex('(')+1:-1]
Seven Samurai (The Magnificent Seven) (Shichinin no samurai) 1954
>>>
>>>