Question

我有一个熊猫数据框，其中的一列如下所示（充满了各种长度的列表）：

In [10]:df.genres
Out[10]: 
0         [Action, Adventure, Science Fiction, Thriller]
1         [Action, Adventure, Science Fiction, Thriller]
2         [Adventure, Science Fiction, Thriller]
3         [Action, Adventure, Science Fiction, Fantasy]
4         [Drama, Science Fiction]
5         [Action]

我只需要保留每个列表中的第一项，其余的就丢弃。

我尝试通过在“类型”列中删除具有空值的行后运行以下代码来做到这一点。

df['genres'] = pd.DataFrame(df['genres'].values.tolist())

df['genres'] = pd.DataFrame(df['genres'].tolist())

分别尝试了每个项目后，我可以将每个项目保留在每个列表中，但是由于某种原因，最终在“类型”列中创建了具有空值的新行。

下面是我在最初删除空行之前和运行上面的代码之前得到的内容：

In [11]:df.info()
Out[11]:

RangeIndex: 10866 entries, 0 to 10865
Data columns (total 7 columns):
original_title    10866 non-null object
runtime           10866 non-null int64
genres            10843 non-null object
vote_average      10866 non-null float64
release_year      10866 non-null int64
budget_adj        10866 non-null float64
revenue_adj       10866 non-null float64

下面是删除空行并运行以上两行代码后得到的结果：

In [15]:df.info()
Out[15]:

RangeIndex: 10866 entries, 0 to 10865
Data columns (total 7 columns):
original_title    10843 non-null object
runtime           10843 non-null int64
genres            10820 non-null object
vote_average      10843 non-null float64
release_year      10843 non-null int64
budget_adj        10843 non-null float64
revenue_adj       10843 non-null float64

如您所见，在运行了上面的代码后，'genres'列中突然有23个新行具有空值。此外，我检查了那些带有空值的新的23行，它们最初都具有值。

有关更多背景信息，“类型”列最初看起来像这样：

In [3]:df.genres
Out[3]: 
0         Action|Adventure|Science Fiction|Thriller
1         Action|Adventure|Science Fiction|Thriller
2         Adventure|Science Fiction|Thriller
3         Action|Adventure|Science Fiction|Fantasy
4         Drama|Science Fiction
5         Action

然后我使用以下代码将其转换为列表：

df['genres'] = df['genres'].str.split('|')

是否有更好的方法来执行此操作，或者是否还有其他操作正在创建这些新的null值？

如何在长度不一的数据框列列表中仅保留一个值

0 个答案: