我的数据集看起来像这样:
film_title writers actors
0 Leonardo Dicaprio, Jason Statham, Dwayne Johnson...
1 Jack Nicholson, Robert De Niro, Denzel Washington...
2 Jack Nicholson, Jason Statham, Dwayne Johnson...
'...'表示该单元格中有更多参与者;我正在尝试将所有演员放在列表中(并且不包括重复的演员)。到目前为止,我有以下代码:
actorsList = df_final.actors.str.split(', ') #which splits the cells into multiple lists
#print(actorsList) will print this:
['Leonardo Dicaprio', 'Jason Statham', 'Dwayne Johnson'...]
['Jack Nicholson', 'Robert De Niro', 'Denzel Washington'...]
['Jack Nicholson', 'Jason Statham', 'Dwayne Johnson'...]
如此
print(actorsList[0]) #will print the first list: ['Leonardo Dicaprio', 'Jason Statham', 'Dwayne Johnson'...]
然后我尝试再次遍历此列表,并存储每个演员的名字(不要重复,因为它们可以出现在多部电影中)
#ITERATE THROUGH ONE LIST
for i in range(len(actorsList[0])):
txt = actorsList[0][i].split(', ')
print(txt)
这会打印出这样的内容:
['Leonardo Dicaprio']
['Jason Statham']
['Dwayne Johnson']
and so on
我正在尝试为每个列表执行此操作,但是最终出现此错误:
23 for i in range(len(actorsList)-1):
---> 24 for j in range(len(actorsList[i])):
25 txt = actorsList[i][j].split(', ')
26 print(txt)
TypeError: object of type 'float' has no len()
我还应该提到它运行(打印结果)的事实,但是它停止了,然后出现此错误。