我有一个包含多个类型的列,我正在尝试拆分类型列表,以便分别获取每个类型,无论我尝试什么,我都会在数据框中为整个列获取NaN。
这就是数据的样子:
0 [Drama,, Romance]
1 [Animation,, Comedy,, Kids, &, Family]
2 [Drama,, Mystery, &, Suspense]
3 [Drama]
4 NaN
5 [Art, House, &, International,, Drama]
6 [Art, House, &, International,, Drama,, Romance]
7 [Documentary]
8 [Action, &, Adventure,, Animation,, Art, House...
9 [Action, &, Adventure,, Drama,, Western]
10 [Comedy,, Horror]
我想要: ["戏剧","浪漫"] ["动画","喜剧","儿童&家庭"] ......
我这样做是因为我希望能够看到有多少独特的流派,目前我只能看到独特的列表,但我想要每个独特的流派。 我甚至不确定我是否以正确的方式解决这个问题,所以非常感谢任何帮助。
这是我最近的尝试: (x等于显示的数据加上更多行)
x = pd.Series(x)
x = x.str.split()
[i.str.split() for i in x]
非常感谢您的帮助!
答案 0 :(得分:0)
您的数据似乎与一些无关的逗号不一致。假设您的数据实际上是string
,则您需要eval
列表的string
表示形式为list
。
几步:
# First, import ast to use for literal_eval()
import ast
# Then, remove the extraneous commas
new_df = df[0].str.replace(', ',' ')
# Then, add quotes into your listed items to prep for eval.
new_df = new_df.str.replace(r'(?P<item>\b[\w &]+)',r'"\1"')
# Then, eval the string representation
lst = [ast.literal_eval(i) for i in new_df if pd.notnull(i)]
# Or, you can just put all of this together:
lst = [ast.literal_eval(i) for i in df[0].str.replace(', ',' ').str.replace(r'(?P<item>\b[\w &]+)',r'"\1"') if pd.notnull(i)]
<强>输出:强>
[['Drama', 'Romance'],
['Animation', 'Comedy', 'Kids & Family'],
['Drama', 'Mystery & Suspense'],
['Drama'],
['Art House & International', 'Drama'],
['Art House & International', 'Drama', 'Romance'],
['Documentary'],
['Action & Adventure', 'Animation', 'Art House'],
['Action & Adventure', 'Drama', 'Western'],
['Comedy', 'Horror']]
如果您想要索引并将其表示为字典:
d = {i: ast.literal_eval(j) for i, j in new_df.items() if pd.notnull(j)}
<强>输出:强>
{0: ['Drama', 'Romance'],
1: ['Animation', 'Comedy', 'Kids & Family'],
2: ['Drama', 'Mystery & Suspense'],
3: ['Drama'],
5: ['Art House & International', 'Drama'],
6: ['Art House & International', 'Drama', 'Romance'],
7: ['Documentary'],
8: ['Action & Adventure', 'Animation', 'Art House'],
9: ['Action & Adventure', 'Drama', 'Western'],
10: ['Comedy', 'Horror']}
如果你想在DataFrame
中使用它,我不确定你想要的是什么,但是一旦你有了dict
或list
它恢复原状是微不足道的。