从pandas数据框列的字典列表中获取第一个值

时间:2018-09-18 14:15:33

标签: python json pandas dataframe nested

我有一个熊猫数据框:

id       photos
001      [{'medium':'https:blablabla1',
           'xl':'something1',
           's':'anotherthing1'},
         {'medium':'https:blablabla2',
           'xl':'something2',
           's':'anotherthing2'},
         {'medium':'https:blablabla3',
           'xl':'something3',
           's':'anotherthing3'}]
002      [{'medium':'https:blablabla4',
           'xl':'something4',
           's':'anotherthing4'},
         {'medium':'https:blablabla5',
           'xl':'something5',
           's':'anotherthing5'},
         {'medium':'https:blablabla6',
           'xl':'something6',
           's':'anotherthing6'}]
003      [{'medium':'https:blablabla7',
           'xl':'something7',
           's':'anotherthing7'},
         {'medium':'https:blablabla8',
           'xl':'something8',
           's':'anotherthing8'},
         {'medium':'https:blablabla9',
           'xl':'something9',
           's':'anotherthing9'}]

第二个照片列包含词典列表。 我想得到的是列表中第一个key:value对的值。

所需的输出应如下所示:

id       image_url
001      https:blablabla1
002      https:blablabla4
003      https:blablabla7

我想出了如果它是一则字典的话该怎么做。像这样:

dicts_list = [{'medium':'https:blablabla1',
           'xl':'something1',
           's':'anotherthing1'},
         {'medium':'https:blablabla2',
           'xl':'something2',
           's':'anotherthing2'},
         {'medium':'https:blablabla3',
           'xl':'something3',
           's':'anotherthing3'}]

# Access the first value of the first dict in a list 
list(dicts_list[0].values())[0]

#output
'https:blablabla1'

到目前为止,我已经实现了这一目标(这显然是错误的):

v = list()
for index, rows in df.iterrows():
    photo = rows['photos']
    v.append(photo[0])

# output
['[', '[']

想法是将第一个值放入列表中,然后将其添加回原始数据框中。 我不知道如何将其扩展到熊猫数据框。

注意 基于@ daren-thomas的答案,我发现我在数据中引用的字典是字典的字符串表示形式。因此,要将此列转换为字典,请参考以下代码:

import ast
df.photos = df.photos.apply(lambda x: ast.literal_eval(x))

2 个答案:

答案 0 :(得分:1)

这是一种实现方法。如果您的列或 void numberTextBox_TextChanged(object sender, EventArgs e) { string value = numberTextBox.Text.Replace(",", ""); ulong ul; if (ulong.TryParse(value, out ul)) { numberTextBox.TextChanged -= numberTextBox_TextChanged; numberTextBox.Text = string.Format("{0:#,#}", ul); numberTextBox.SelectionStart = numberTextBox.Text.Length; numberTextBox.TextChanged += numberTextBox_TextChanged; } } 是如下所示的字典列表:

Series

不确定是否有更优雅的解决方案。但是希望这会有所帮助!

-编辑-

作为一个旁注,我知道>>> import pandas as pd >>> s = pd.Series([[{'medium':'https:blablabla1', ... 'xl':'something1', ... 's':'anotherthing1'}, ... {'medium':'https:blablabla2', ... 'xl':'something2', ... 's':'anotherthing2'}, ... {'medium':'https:blablabla3', ... 'xl':'something3', ... 's':'anotherthing3'}], ... [{'medium':'https:blablabla4', ... 'xl':'something4', ... 's':'anotherthing4'}, ... {'medium':'https:blablabla5', ... 'xl':'something5', ... 's':'anotherthing5'}, ... {'medium':'https:blablabla6', ... 'xl':'something6', ... 's':'anotherthing6'}], ... [{'medium':'https:blablabla7', ... 'xl':'something7', ... 's':'anotherthing7'}, ... {'medium':'https:blablabla8', ... 'xl':'something8', ... 's':'anotherthing8'}, ... {'medium':'https:blablabla9', ... 'xl':'something9', ... 's':'anotherthing9'}]]) >>> s 0 [{'medium': 'https:blablabla1', 'xl': 'somethi... 1 [{'medium': 'https:blablabla4', 'xl': 'somethi... 2 [{'medium': 'https:blablabla7', 'xl': 'somethi... dtype: object >>> s.apply(pd.Series)[0].apply(pd.Series).medium 0 https:blablabla1 1 https:blablabla4 2 https:blablabla7 Name: medium, dtype: object 社区中普遍禁止大量使用apply。特别是如果您有非常大的pandas ...,则会看到一些性能问题。

我真的想不出一个DataFrame解决方案。但是,如果您的数据集不太大,我认为应该可以解决问题。

答案 1 :(得分:1)

您可以在每一行上使用apply函数,如下所示:

df['image_url'] = df.apply(lambda row: row.photos[0]['medium'], axis=1)

输出:

In [23]: df
Out[23]:
   id                                         photos         image_url
0  001  [{u's': u'anotherthing1', u'medium': u'https:b...  https:blablabla1
1  002  [{u's': u'anotherthing4', u'medium': u'https:b...  https:blablabla4
2  003  [{u's': u'anotherthing7', u'medium': u'https:b...  https:blablabla7

现在,如果您不喜欢photos列,则可以将其删除...