从数据框列将流派提取到列表中

时间:2019-04-01 12:58:48

标签: python pandas

我有一个dataframe,看起来像这样-

id  genres
1   [{'id': 35, 'name': 'Comedy'}]
2   [{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}]
3   [{'id':31, 'name':'Romance'}]

我想从每个row中提取类型,并将它们存储在list中。例如-

id  genres
1   ['Comedy']
2   ['Comedy','Drama','Family','Romance']
3   ['Romance']

我尝试过- [j['name'] for i in data['genres'] for j in i] 但它会将所有行都写到一个列表中。

3 个答案:

答案 0 :(得分:3)

使用apply

例如:

import pandas as pd

df = pd.DataFrame({"genres": [[{'id': 35, 'name': 'Comedy'}],[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10751, 'name': 'Family'}, {'id': 10749, 'name': 'Romance'}],[{'id':31, 'name':'Comedy'}]]})
df["genres"] = df["genres"].apply(lambda x: [i["name"] for i in x])
print(df)

输出:

                             genres
0                          [Comedy]
1  [Comedy, Drama, Family, Romance]
2                          [Comedy]

答案 1 :(得分:1)

使用嵌套列表理解:

data['genres'] = [[j['name'] for j in i] for i in data['genres']]

对于更一般的解决方案,更好的方法是get-如果不存在name键,则不会失败,但是返回None或另一个指定的值:

data['genres'] = [[j.get('name') for j in i] for i in data['genres']]

data['genres'] = [[j.get('name', 'missing') for j in i] for i in data['genres']]

print (data)
   id                            genres
0   1                          [Comedy]
1   2  [Comedy, Drama, Family, Romance]
2   3                         [Romance]

答案 2 :(得分:0)

另外一种可能的方法是使用apply():

df['genres'] = df['genres'].apply(lambda x: [d.get('name') for d in x])