Split one dict into many dicts by startswith condition

时间:2016-02-12 20:05:23

标签: python python-3.x

The basic case:

I have a dictionary:

kwargs = {'scale_a': 10, 'scale_b': 20,
          'shift_a': 15, 'shift_x_and_y': (1,5)}

and I want to add them into another dictionary but that dictionary has subdictionaries based on the startstring of kwargs. So the result should be:

kwds = {'scale': {'a': 10, 'b': 20},
        'shift': {'a': 15, 'x_and_y': (1, 5)}}

I'm currently using a longish loop:

kwds = {'scale': {}, 'shift': {}}
for i in kwargs:
    splitted = i.split('_', 1)
    try:
        # This will fail if the prefix is not initialized in the dict with a KeyError.
        kwds[splitted[0]][splitted[1]] = kwargs[i]
    except KeyError:
        raise KeyError('Unknown prefix {0} for parameter {1}'
                       ''.format(splitted[0], i))

and it is working but I was wondering if one could rewrite that as a dict-comprehension. Or shorten it in some other way. I tried:

kwds = {key.split('_', 1)[1]: kwargs[key] for key in kwargs}

but that just gives me

{'a': 15, 'b': 20, 'x_and_y': (1, 5)}

which is not what I want. Is there any way to get the dictionaries inside this dictionary?


Some explanation why I want to do it:

I'm using Python3 and I'm writing a function that combines several different functions and since some of the functions are from libraries numpy, astropy and others I'm using **kwargs as parameter and want to distribute the values inside the function.

There are some downsides: Packages alter their API and depending on the functions there might even be the same parameter name twice. So I don't want to hardcode the parameters.

I'll illustrate it with a simple example:

I have several images and I want to combine them. The steps are:

  • Scale the images: function takes a callable as parameter and the callable needs some arguments
  • Shift the images: callable with parameters
  • Stack the images: np.dstack so no arguments needed
  • Reject points: again the function takes an arbitary callable and this function wants some input
  • Combine images: again a callable but this time it needs no arguments because I want to limit them to numpy functions along axis=2
  • Create a deviation image: To see what the variance in the points are. This is another callable that might need some additional parameters, for example np.std could use a ddof parameter.

So I thought the most straightforward way to do it would be to expect the user to use keywords that start with scale_ if they should be passed to the scale_func or shift_ if they should be passed to shift_func, etc. This is to avoid that a parameter with the same name for two of these functions would be a problem and allows to handle each of the functions in the most appropriate way (meaning if I only have numpy 1.7 I can only specify the parameters that the function is accepting).

I like this to be fast but I can't figure out how I would do it.

My solution is (rather shortened just to see what I mean):

def combine(images, scale_func, shift_func, reject_func, comb_func, dev_func, **kwargs):
    # Prefix should be scale_ for parameters passed to scale_func, shift_ for shift_func, ...

    kwds = {j.split('_', 0)[1]: kwargs[j] for j in kwargs}

    for i in range(len(images)):
        images[i] = shift_func(images[i], **kwds['shift'])
        images[i] = scale_func(images[i], **kwds['scale'])
...

The line kwds[i] = {j.split('_', 1): kwargs[j] for j in kwargs} is what is not working. Until now I used

kwds = {'scale': {}, 'shift': {}, 'reject': {}, 'dev': {}}
for i in kwargs:
    splitted = i.split('_', 1)
    try:
        # This will fail if the prefix is not initialized in the dict with a KeyError.
        kwds[splitted[0]][splitted[1]] = kwargs[i]
    except KeyError:
        raise KeyError('Unknown prefix {0} for parameter {1}'
                       ''.format(splitted[0], i))

but that's in my opinion not very pythonic and not very fast. I wondered if one could solve that with a dict comprehension or in some other short and fast way.

Any help is appreciated.

2 个答案:

答案 0 :(得分:1)

EDITED: corrected the original answer and added a shorter version.

Not sure what errors you are getting. Probably it's due to import numpy as np import pandas as pd from bokeh.plotting import figure, show from bokeh.models import Range1d data = {'Cities': {'Des_Moines': 80.0, 'Lubbock': -300.0, 'Minneapolis': 85.7, 'Orange_County': 80.0, 'Salt_Lake_City': 81.8, 'San_Diego': 80.0, 'San_Francisco': -400.0, 'Troy': -400.0, 'Wilmington': -300.0}} #df_data = pd.DataFrame(data).sort_values('Cities', ascending=False) df_data = pd.DataFrame(data).sort(columns='Cities',ascending=False) this_series = df_data.loc[:,'Cities'] p = figure(width=800, height=600, y_range=this_series.index.tolist()) p.background_fill = "#EAEAF2" p.grid.grid_line_alpha=1.0 p.grid.grid_line_color = "white" p.xaxis.axis_label = 'xlabel' p.xaxis.axis_label_text_font_size = '14pt' p.xaxis.major_label_text_font_size = '14pt' #p.x_range = Range1d(0,50) #p.xaxis[0].ticker=FixedTicker(ticks=[i for i in xrange(0,5,1)]) p.yaxis.major_label_text_font_size = '14pt' p.yaxis.axis_label = 'ylabel' p.yaxis.axis_label_text_font_size = '14pt' j = 1 for k,v in this_series.iteritems(): print k,v,j p.rect(x=v/2, y=j, width=abs(v), height=0.4,color=(76,114,176), width_units="data", height_units="data") j += 1 show(p) (there is only lsplit and split, no rsplit). But in any case I'd suggest factoring some logic out to a helper function:

lsplit

This may be a little cleaner (not much, I'm afraid) and easier to understand, with # A helper function which validates and splits a keyword def split_key(key): if not key.startswith(('scale_', 'shift_', 'reject_', 'dev_')): raise KeyError('unknown key', key) return key.split('_', maxsplit=1) # Then call it from combine def combine(..., **kwargs) kwds = {} for key, value in kwargs.items(): major_key, minor_key = split_key(key) kwds.setdefault(major_key, {})[minor_key] = value ... focusing on the main business and combine taking care of the details (and raising split_key for invalid keys).

If you don't need validation, then the following two-liner will do

KeyError

though I can't recommend using it if someone will read your code:

答案 1 :(得分:0)

I broke the function down for you and made assumptions on what you tried to do:

Check whether the key starts with one of the special values string hi = (char)60+"![CDATA[" + "]]"+ (char)62;

['scale_', 'shift_', 'reject_', 'dev_']

Take the said special value alone and makes it the key to the dict

[kwargs[j] for j in kwargs if j.startswith(i)] #returns dict value

Builds a dictionary

j.split('_', maxsplit=1)[0] # Corrected