Question

我有这个csv文件：

movieId;title;genres
1;Toy Story (1995);Adventure|Animation|Children|Comedy|Fantasy
2;Jumanji (1995);Adventure|Children|Fantasy
3;Grumpier Old Men (1995);Comedy|Romance
4;Waiting to Exhale (1995);Comedy|Drama|Romance
5;Father of the Bride Part II (1995);Comedy
6;Heat (1995);Action|Crime|Thriller
7;Sabrina (1995);Comedy|Romance
8;Tom and Huck (1995);Adventure|Children
9;Hate (Haine, La) (1995);Crime|Drama
10;Seven (a.k.a. Se7en) (1995);Mystery|Thriller

我想从字段标题中生成一个名为year的新字段，因为字段标题还包含电影的年份。我试过这种方式，但它不起作用：

import pandas
df=pandas.read_csv("/Users/Desktop/IMDB.csv")
str=df
str1="(19"
str2="(20"
str3="(21"
str.find(str1, beg=0, end=len(string))
str.find(str1, beg=0, end=len(string)) 
str.find(str1, beg=0, end=len(string))

Answer 1

如果包含长度为4的数字，则使用正则表达式str.extract表示括号中的值：

df['year'] = df['title'].str.extract('\((\d{4})\)', expand=False).astype(int)
print (df)
   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   
5        6                         Heat (1995)   
6        7                      Sabrina (1995)   
7        8                 Tom and Huck (1995)   
8        9             Hate (Haine, La) (1995)   
9       10         Seven (a.k.a. Se7en) (1995)   

                                        genres  year  
0  Adventure|Animation|Children|Comedy|Fantasy  1995  
1                   Adventure|Children|Fantasy  1995  
2                               Comedy|Romance  1995  
3                         Comedy|Drama|Romance  1995  
4                                       Comedy  1995  
5                        Action|Crime|Thriller  1995  
6                               Comedy|Romance  1995  
7                           Adventure|Children  1995  
8                                  Crime|Drama  1995  
9                             Mystery|Thriller  1995

生成新的字段csv python

1 个答案: