我有一个熊猫数据框:
id photos
001 [{'medium':'https:blablabla1',
'xl':'something1',
's':'anotherthing1'},
{'medium':'https:blablabla2',
'xl':'something2',
's':'anotherthing2'},
{'medium':'https:blablabla3',
'xl':'something3',
's':'anotherthing3'}]
002 [{'medium':'https:blablabla4',
'xl':'something4',
's':'anotherthing4'},
{'medium':'https:blablabla5',
'xl':'something5',
's':'anotherthing5'},
{'medium':'https:blablabla6',
'xl':'something6',
's':'anotherthing6'}]
003 [{'medium':'https:blablabla7',
'xl':'something7',
's':'anotherthing7'},
{'medium':'https:blablabla8',
'xl':'something8',
's':'anotherthing8'},
{'medium':'https:blablabla9',
'xl':'something9',
's':'anotherthing9'}]
第二个照片列包含词典列表。 我想得到的是列表中第一个key:value对的值。
所需的输出应如下所示:
id image_url
001 https:blablabla1
002 https:blablabla4
003 https:blablabla7
我想出了如果它是一则字典的话该怎么做。像这样:
dicts_list = [{'medium':'https:blablabla1',
'xl':'something1',
's':'anotherthing1'},
{'medium':'https:blablabla2',
'xl':'something2',
's':'anotherthing2'},
{'medium':'https:blablabla3',
'xl':'something3',
's':'anotherthing3'}]
# Access the first value of the first dict in a list
list(dicts_list[0].values())[0]
#output
'https:blablabla1'
到目前为止,我已经实现了这一目标(这显然是错误的):
v = list()
for index, rows in df.iterrows():
photo = rows['photos']
v.append(photo[0])
# output
['[', '[']
想法是将第一个值放入列表中,然后将其添加回原始数据框中。 我不知道如何将其扩展到熊猫数据框。
注意 基于@ daren-thomas的答案,我发现我在数据中引用的字典是字典的字符串表示形式。因此,要将此列转换为字典,请参考以下代码:
import ast
df.photos = df.photos.apply(lambda x: ast.literal_eval(x))
答案 0 :(得分:1)
这是一种实现方法。如果您的列或 void numberTextBox_TextChanged(object sender, EventArgs e)
{
string value = numberTextBox.Text.Replace(",", "");
ulong ul;
if (ulong.TryParse(value, out ul))
{
numberTextBox.TextChanged -= numberTextBox_TextChanged;
numberTextBox.Text = string.Format("{0:#,#}", ul);
numberTextBox.SelectionStart = numberTextBox.Text.Length;
numberTextBox.TextChanged += numberTextBox_TextChanged;
}
}
是如下所示的字典列表:
Series
不确定是否有更优雅的解决方案。但是希望这会有所帮助!
-编辑-
作为一个旁注,我知道>>> import pandas as pd
>>> s = pd.Series([[{'medium':'https:blablabla1',
... 'xl':'something1',
... 's':'anotherthing1'},
... {'medium':'https:blablabla2',
... 'xl':'something2',
... 's':'anotherthing2'},
... {'medium':'https:blablabla3',
... 'xl':'something3',
... 's':'anotherthing3'}],
... [{'medium':'https:blablabla4',
... 'xl':'something4',
... 's':'anotherthing4'},
... {'medium':'https:blablabla5',
... 'xl':'something5',
... 's':'anotherthing5'},
... {'medium':'https:blablabla6',
... 'xl':'something6',
... 's':'anotherthing6'}],
... [{'medium':'https:blablabla7',
... 'xl':'something7',
... 's':'anotherthing7'},
... {'medium':'https:blablabla8',
... 'xl':'something8',
... 's':'anotherthing8'},
... {'medium':'https:blablabla9',
... 'xl':'something9',
... 's':'anotherthing9'}]])
>>> s
0 [{'medium': 'https:blablabla1', 'xl': 'somethi...
1 [{'medium': 'https:blablabla4', 'xl': 'somethi...
2 [{'medium': 'https:blablabla7', 'xl': 'somethi...
dtype: object
>>> s.apply(pd.Series)[0].apply(pd.Series).medium
0 https:blablabla1
1 https:blablabla4
2 https:blablabla7
Name: medium, dtype: object
社区中普遍禁止大量使用apply
。特别是如果您有非常大的pandas
...,则会看到一些性能问题。
我真的想不出一个DataFrame
解决方案。但是,如果您的数据集不太大,我认为应该可以解决问题。
答案 1 :(得分:1)
您可以在每一行上使用apply
函数,如下所示:
df['image_url'] = df.apply(lambda row: row.photos[0]['medium'], axis=1)
输出:
In [23]: df
Out[23]:
id photos image_url
0 001 [{u's': u'anotherthing1', u'medium': u'https:b... https:blablabla1
1 002 [{u's': u'anotherthing4', u'medium': u'https:b... https:blablabla4
2 003 [{u's': u'anotherthing7', u'medium': u'https:b... https:blablabla7
现在,如果您不喜欢photos
列,则可以将其删除...