我具有以下结构:
[
{
"Name": "a-1",
"Tags": [
{
"Value": "a",
"Key": "Type"
}
],
"CreationDate": "2018-02-25T17:33:19.000Z"
},
{
"Name": "a-2",
"Tags": [
{
"Value": "a",
"Key": "Type"
}
],
"CreationDate": "2018-02-26T17:33:19.000Z"
},
{
"Name": "b-1",
"Tags": [
{
"Value": "b",
"Key": "Type"
}
],
"CreationDate": "2018-01-21T17:33:19.000Z"
},
{
"Name": "b-2",
"Tags": [
{
"Value": "b",
"Key": "Type"
}
],
"CreationDate": "2018-01-22T17:33:19.000Z"
},
{
"Name": "c-1",
"Tags": [
{
"Value": "c",
"Key": "Type"
}
],
"CreationDate": "2018-08-29T17:33:19.000Z"
}
]
当组中成员多于一个时,我想打印出每个Name
中最旧的Value
(这应该是可配置的。例如:x个最旧的项y个成员)。在这种情况下,有两个a
,两个b
和一个c
,因此预期结果将是:
a-1
b-1
如果我的Python代码是:
data = ec2.describe_images(Owners=['11111'])
images = data['Images']
grouper = groupby(map(itemgetter('Tags'), images))
groups = (list(vals) for _, vals in grouper)
res = list(chain.from_iterable(filter(None, groups)))
当前res
仅包含Key
和Value
的列表,并且没有分组。任何人都可以向我展示如何继续执行代码以达到预期结果吗?
答案 0 :(得分:0)
这是一个使用熊猫的解决方案,它使用json字符串作为输入(json_string
)
很多时候,熊猫是过大的,但是在这里,我认为这会很好,因为您基本上想按值分组,然后根据标准(例如拥有多少成员)来消除一些分组
import pandas as pd
# load the dataframe from the json string
df = pd.read_json(json_string)
df['CreationDate'] = pd.to_datetime(df['CreationDate'])
# create a value column from the nested tags column
df['Value'] = df['Tags'].apply(lambda x: x[0]['Value'])
# groupby value and iterate through groups
groups = df.groupby('Value')
output = []
for name, group in groups:
# skip groups with fewer than 2 members
if group.shape[0] < 2:
continue
# sort rows by creation date
group = group.sort_values('CreationDate')
# save the row with the most recent date
most_recent_from_group = group.iloc[0]
output.append(most_recent_from_group['Name'])
print(output)