Question

我必须为我的测量数据创建一个大词典。到目前为止，我的（简化）代码看起来像这样：

i = 0  

for i in range(len(station_data_files_pandas)):  # range(0, 299)
    station_data_f_pandas = station_data_files_pandas[i]
    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)

    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }
    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            "MO_RR"    
            }
    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }
    # creating the dictionary layer where the staiondata can get called by station id
    station_data_dic = {
            station_id : station
            }
    # creating the final layer of the dictionary
    station_data_dictionary = {
            "station_data": station_data_dic
            }

这是输出：

station_data_dictionary
Out[387]: 
{'station_data': {'4706': {'montly_data': {'MO_RR'},   # "4706" is the id from the last element in station_data_files_pandas
   'anual_data': {'Y_RR': YearMonth
           # YearMonth is the index...
           # I actually wanted the Index just to show yyyy-mm ...
    1981-12-31    1164.3
    1982-12-31     852.4
    1983-12-31     826.5
    1984-12-31     798.8
    1985-12-31       NaN
    1986-12-31       NaN
    1987-12-31       NaN
    1988-12-31       NaN
    1989-12-31       NaN
    1990-12-31    1101.1
    1991-12-31     892.4
    1992-12-31     802.1
    1993-12-31     873.5
    1994-12-31     842.7
    1995-12-31     962.0
    1996-12-31       NaN
    1997-12-31     927.9
    1998-12-31       NaN
    1999-12-31       NaN
    2000-12-31     997.8
    2001-12-31     986.3
    2002-12-31    1117.6
    2003-12-31     690.8
    2004-12-31       NaN
    2005-12-31       NaN
    2006-12-31       NaN
    2007-12-31       NaN
    2008-12-31       NaN
    2009-12-31       NaN
    2010-12-31       NaN
    Freq: A-DEC, Name: MO_RR, dtype: float64}}}}

如您所见，我的输出仅包含一个“工作表”。预计将为300张。

我假设我的代码在循环时会覆盖数据，因此最后我的输出只是从station_data_files_pandas中的最后一个元素制成的一张纸。我怎样才能解决这个问题？我的方法可能是完全错误的吗？...

准备就绪后，它必须看起来像：

station_data_dictionary["station_data"]["403"]["anual_data"]["Y_RR"]
station_data_dictionary["station_data"]["573"]["anual_data"]["Y_RR"]
station_data_dictionary["station_data"]["96"]["anual_data"]["Y_RR"]

...等等。

如您所见，由于我在字典中称不同的东西，唯一可以更改的是我的station_id。

注意：只有一个标题完全相同的问题，但这对我完全没有帮助...

Answer 1

我没有经过测试，因为我没有您的数据，但这应该可以生成您需要的字典。唯一的变化是在顶部和底部：

station_data_dictionary = {
    "station_data": {}
}

for i in range(len(station_data_files_pandas)):  # range(0, 299)

    station_data_f_pandas = station_data_files_pandas[i]

    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))

    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)

    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }

    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            "MO_RR"    
            }

    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }

    station_data_dictionary["station_data"][station_id] = station

请注意，i = 0循环之前不需要for之类的语句，因为该循环会为您初始化变量。

此外，字典的"station_data"层似乎是多余的，因为它是该层的唯一键，但是您将其包含在所需的输出中，所以我将其保留了。

Answer 2

请在下面尝试。另外，如果您需要按照添加字典的方式使字典保持有序，则必须使用collections包中的OrderedDict。

因此，当您打印字典或遍历其数据时，将按照在以下代码中添加它们的顺序进行循环。

Obs：我假设station_data_files_pandas是一个列表，而不是一个字典，这就是为什么我更改了for循环“签名”以使用增强功能的原因。如果我错了，并且这个变量实际上是一个字典，并且for循环的每个整数都是该字典的键，那么您也可以遍历以下项：

for k, v in station_data_files_pandas.items():
    # now k carries the integer you were using before.
    # and v carries station_data_f_pandas

代码更正

import collections

station_data_dictionary=collections.OrderedDict()

#for i in range(len(station_data_files_pandas)):  # range(0, 299)
  # using the enhanced for loop
  for station_data_f_pandas in station_data_files_pandas:  # range(0, 299) 

    # This is not needed anymore    
    # station_data_f_pandas = station_data_files_pandas[i]

    # station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))
    # You could directly convert to string
    station_id = str(int(station_data_f_pandas["STATIONS_ID"][0]))

    Y_RR = station_data_f_pandas["MO_RR"].resample("A").apply(very_sum)
    MO_RR = # something goes here


    # creating the dictionary layer for the anual data in this dictionary
    anual_data = {
            "Y_RR" : Y_RR
            }

    # creating the dictionary layer for the montly data in this dictionary
    montly_data = {
            # "MO_RR"
            # You can't have just a key to your dictionary, you need to assign a value to it.

            "MO_RR": MO_RR             
            }

    # creating the dictionary layer for every station. Everystation has montly and anual data
    station = {
            "montly_data" : montly_data,
            "anual_data" : anual_data
            }

    # creating the dictionary layer where the staiondata can get called by station id

    station_data_dic = {
            station_id : station
            }


    # creating the final layer of the dictionary
    #station_data_dictionary = {
    #       "station_data": station_data_dic
    #        }

    # Why use {"apparently_useless_id_layer": {"actual_id_info": "data"}}
    # instead of {"actual_info_id": "data"} ?
    station_data_dictionary[station_id] = station

通过for循环（？）创建字典

2 个答案:

代码更正