Question

我有一个带有位置参数的数据集。大约有75个地点。每个位置都可以有子位置。我需要为每个位置绘制图，所以我将数据集分解为数据框字典，并处理字典中的每个值。

现在，我需要按子位置将字典中的每个值（属于位置的数据集）分解为数据集。因此，如果一个位置有3个子位置，则需要3个新的数据框。

使用以下帖子： PANDAS split dataframe to multiple by unique values rows

f = '..../demo_copy.csv'
d = pd.DataFrame()
d= pd.read_csv(f)
dfs = dict(tuple(d.groupby('location')))
for key, value in dfs.items():
    fig, (ax1, ax2) = plt.subplots(ncols=2, sharey=False)
    sns.catplot(data = value,
    x='ethnic',
    kind = 'count',
    palette = 'cubehelix',
    label = 'Ethnicity',
    ax=ax1)
    #plt.savefig("{0}.pdf".format(key), bbox_inches = 'tight')
for key, value in dfs.items():
    dfs2 = dict(tuple(value.groupby('site')))

当我看dfs2的长度时，我注意到它只有3个数据集。我知道有将近300个子位置，所以我需要dfs2来获得子位置的键名和所有d的值，以及相应的位置和子位置

编辑：我将附加一些示例数据 Sample Data. In the real data (it's sensitive can't post it) there are over 70locations and 300 sublocations

字典dfs具有键M1值：（所有位置为M1的行）

现在我需要具有键21M1值的dfs2 ：（所有子位置为21M1的行）它们仍应按位置分组，这就是为什么我在考虑“子词典”的原因 EDIT2：按照@Joe的建议，我使用了这样一个事实，即可以使用已有的字典访问每个位置。使用原始数据，我可以列出唯一的子位置值。然后使用循环遍历每个dict值，并创建一个tmp数据帧，其中子位置与唯一列表中的某个值匹配。我可以使用temp数据框进行统计。我也添加了代码。可能这有缺陷吗？

for i in dfs.values():
    for j in unique_list:
        try:
            tmp = i[i['sublocation']==j]
            ax1 = sns.countplot(y='ethnic_cde', data=tmp, orient='h', palette ='colorblind');
            sns.despine();
            ax1.set(xlabel='Count', ylabel='Ethnicity by Code');
            plt.savefig("{0}.pdf".format(j), bbox_inches = 'tight')
        except:
            pass

编辑3：最后一件事让我陷入困境。我无法将文件保存在正确的目录中。我做了一个新的字典，其中的键：值是subloc：d [d ['subloc'] == X]

for key, value in dfss.items():
    a = str(value['location']);
    try:

        fig, axs = plt.subplots(1, 3);
        tmppath = 'path';
        sns.countplot(y='ethnic_cde', data=value, orient='h', palette ='colorblind', ax=axs[0]);
        sns.countplot(y='ProgramRatio', data=value,orient='v', palette ='colorblind',ax=axs[1]);
        sns.countplot(y='sublocation', data = value, ax=axs[2]);
        plt.tight_layout();
        plt.savefig(tmppath+a+'/{0}_{1}.pdf'.format(a,key), bbox_inches = 'tight');
        plt.clf();
        #plt.show()
    except:
        pass

Answer 1

@Joe @tomjn
这就是我最终要做的事情：

我拿了两个原始字典，创建了第三个字典，使键是位置，值是子位置列表。

dfs = dict(tuple(d.groupby('location')));
dfss = dict(tuple(d.groupby('sublocation')));
dd = {}
for key, value in dfs.items():
    a = []
    dee={}
    for i in value['sublocation']:
        if i in a:
            pass
        else:
            a.append(str(i))
    dee = {key:a}
    dd.update(dee)
for key, value in dfss.items(): 
    try:
        for k, v in dd.items():
            if key in v:
                dur=str(k)
        else:
             pass

         #CODE FOR PLOTS
         #SAVE PLOT           
   except:
         #SAVE PLOT
         pass

我需要这个是因为我想作一堆图，最终我按照实际情况做如下操作

plt.savefig("path.pdf".format(dur,dur,key), bbox_inches = 'tight')

根据字典创建一组新的数据框

1 个答案: