我有一个像这样的数据框
df['likes']
0 {'data': [{'id': '651703178310339', 'name': 'A...
1 {'data': [{'id': '798659570200808', 'name': 'B...
2 {'data': [{'id': '10200132902001105', 'name': ...
3 {'data': [{'id': '10151983313320836', 'name': ...
4 NaN
5 {'data': [{'id': '1551927888235503', 'name': '...
6 {'data': [{'id': '10204089171847031', 'name': ...
7 {'data': [{'id': '399992547089295', 'name': 'В...
8 {'data': [{'id': '10201813292573808', 'name': ...
9 NaN
有些单元格有多个元素'id'
df['likes'][0]
{'data': [{'id': '651703178310339', 'name': 'A'},
{'id': '10204089171847031', 'name': 'B'}],
'paging': {'cursors': {'after': 'MTAyMDQwODkxNzE4NDcwMzEZD',
'before': 'NjUxNzAzMTc4MzEwMzM5'}}}
有些单元格为零。我想得到一个新的变量
df['number']
0 2
1 4
2 3
4 0
包含多个元素'id'
。 df['likes']
来自dict。我试着计算'id'
df['likes'].apply(lambda x: x.count('id'))
AttributeError: 'dict' object has no attribute 'count'
所以我尝试了这个
df['likes'].apply(lambda x: len(x.keys()))
AttributeError: 'float' object has no attribute 'keys'
如何解决?
我被要求发布一整套数据,我发布了三行,以免占用太多空间
`df['likes']`
`0 {'data': [{'id': '651703178310339', 'name': 'A'},
{'id': '10204089171847031', 'name': 'B'}],
'paging': {'cursors': {'after': 'MTAyMDQwODkxNzE4NDcwMzEZD',
'before': 'NjUxNzAzMTc4MzEwMzM5'}}}
1 {'data': [{'id': '798659570200808', 'name': 'C'},
{'id': '574668895969867', 'name': 'D'},
{'id': '651703178310339', 'name': 'A'},
{'id': '1365088683555195', 'name': 'G'}],
'paging': {'cursors': {'after': 'MTM2NTA4ODY4MzU1NTE5NQZDZD',
'before': 'Nzk4NjU5NTcwMjAwODA4'}}}
2 NaN`
答案 0 :(得分:1)
这几乎有效:
df['likes'].apply(lambda x: len(x['data']))
请注意错误:
> AttributeError: 'float' object has no attribute 'keys'
这是因为你有一些NaN值(表示为浮动NAN)。所以:
df['likes'][df['likes'].notnull()].apply(lambda x: len(x['data']))
答案 1 :(得分:1)
选项1:
In [120]: df.likes.apply(pd.Series)['data'].apply(lambda x: pd.Series(x).notnull()).sum(1)
Out[120]:
0 2.0
1 4.0
2 0.0
dtype: float64
选项2:
In [146]: df['count'] = [sum('id' in d for d in x.get('data',[]))
if pd.notna(x) else 0
for x in df['likes']]
In [147]: df
Out[147]:
likes count
0 {'data': [{'id': '651703178310339', 'name': 'A... 2
1 {'data': [{'id': '798659570200808', 'name': 'C... 4
2 NaN 0
数据集:
In [137]: df.to_dict('r')
Out[137]:
[{'likes': {'data': [{'id': '651703178310339', 'name': 'A'},
{'id': '10204089171847031', 'name': 'B'}],
'paging': {'cursors': {'after': 'MTAyMDQwODkxNzE4NDcwMzEZD',
'before': 'NjUxNzAzMTc4MzEwMzM5'}}}},
{'likes': {'data': [{'id': '798659570200808', 'name': 'C'},
{'id': '574668895969867', 'name': 'D'},
{'id': '651703178310339', 'name': 'A'},
{'id': '1365088683555195', 'name': 'G'}],
'paging': {'cursors': {'after': 'MTM2NTA4ODY4MzU1NTE5NQZDZD',
'before': 'Nzk4NjU5NTcwMjAwODA4'}}}},
{'likes': nan}]