我有这个csv文件:
movieId;title;genres
1;Toy Story (1995);Adventure|Animation|Children|Comedy|Fantasy
2;Jumanji (1995);Adventure|Children|Fantasy
3;Grumpier Old Men (1995);Comedy|Romance
4;Waiting to Exhale (1995);Comedy|Drama|Romance
5;Father of the Bride Part II (1995);Comedy
6;Heat (1995);Action|Crime|Thriller
7;Sabrina (1995);Comedy|Romance
8;Tom and Huck (1995);Adventure|Children
9;Hate (Haine, La) (1995);Crime|Drama
10;Seven (a.k.a. Se7en) (1995);Mystery|Thriller
我想从字段标题中生成一个名为year的新字段,因为字段标题还包含电影的年份。 我试过这种方式,但它不起作用:
import pandas
df=pandas.read_csv("/Users/Desktop/IMDB.csv")
str=df
str1="(19"
str2="(20"
str3="(21"
str.find(str1, beg=0, end=len(string))
str.find(str1, beg=0, end=len(string))
str.find(str1, beg=0, end=len(string))
答案 0 :(得分:3)
如果包含长度为4的数字,则使用正则表达式str.extract
表示括号中的值:
df['year'] = df['title'].str.extract('\((\d{4})\)', expand=False).astype(int)
print (df)
movieId title \
0 1 Toy Story (1995)
1 2 Jumanji (1995)
2 3 Grumpier Old Men (1995)
3 4 Waiting to Exhale (1995)
4 5 Father of the Bride Part II (1995)
5 6 Heat (1995)
6 7 Sabrina (1995)
7 8 Tom and Huck (1995)
8 9 Hate (Haine, La) (1995)
9 10 Seven (a.k.a. Se7en) (1995)
genres year
0 Adventure|Animation|Children|Comedy|Fantasy 1995
1 Adventure|Children|Fantasy 1995
2 Comedy|Romance 1995
3 Comedy|Drama|Romance 1995
4 Comedy 1995
5 Action|Crime|Thriller 1995
6 Comedy|Romance 1995
7 Adventure|Children 1995
8 Crime|Drama 1995
9 Mystery|Thriller 1995