数据

两个DataFrame共享列A：

ROWS = 1000
df = pd.DataFrame.from_dict({'A': np.arange(ROWS),
                             'B': np.random.randint(0, 60, size=ROWS),
                             'C': np.random.randint(0, 100, size=ROWS)})
df.head()

   A   B   C
0  0  10  11
1  1   7  64
2  2  22  12
3  3   1  67
4  4  34  57

还有other我也是这样加入的：

other = pd.DataFrame.from_dict({'A': np.arange(ROWS),
                                'D': np.random.choice(['One', 'Two'], ROWS)})
other.set_index('A', inplace=True)
df = df.join(other, on=['A'], rsuffix='_right')
df.head()

   A   B   C    D
0  0  10  11  One
1  1   7  64  Two
2  2  22  12  One
3  3   1  67  Two
4  4  34  57  One

问题

获取具有以下计数的柱形图的正确方法：

C是GTE50，D是1
C是GTE50，D是2
C是LT50，D是1
C是LT50，D是2

按B分组，分为0、1-10、11-20、21-30、21-40、41 +。

Answer 1

IIUC，可以利用clip和np.ceil来组建一个群组，从而将其大大简化。具有2个级别的单个拆栈为B-分组提供x-轴，每个D-C组合都带有条形：

如果您想要更好的标签，可以映射groupby值：

(df.groupby(['D', 
             df.C.ge(50).map({True: 'GE50', False: 'LT50'}),
             np.ceil(df.B.clip(lower=0, upper=41)/10).map({0: '0', 1: '1-10', 2: '11-20', 3: '21-30', 4: '31-40', 5: '41+'})
            ])
     .size().unstack([0,1]).plot.bar())

也等同于B组：

pd.cut(df['B'],
       bins=[-np.inf, 1, 11, 21, 31, 41, np.inf],
       right=False,
       labels=['0', '1-10', '11-20', '21-30', '31-40', '41+'])

Answer 2

经过几天的磨合，我来回解决了这个问题，但是我认为代码气味有很多方面：

groupby返回一个数据透视表，melt的目的是取消数据透视。
为Cx使用假人，而不为D使用假人？最终，它们都是带有2个选项的分类数据。两天后，当我得到第一个解决方案时，我需要休息一下，然后再尝试另一个将这两个问题同等对待的分支。
reset_index，仅在set_index行之后。必须sort_values之前set_index
最后一个summary.unstack().unstack()读起来像个 hack 。

# %% Cx
df['Cx'] = df['C'].apply(lambda x: 'LT50' if x < 50 else 'GTE50')
df.head()

# %% Bins
df['B_binned'] = pd.cut(df['B'],
                        bins=[-np.inf, 1, 11, 21, 31, 41, np.inf],
                        right=False,
                        labels=['0', '1-10', '11-20', '21-30', '31-40', '41+'])
df.head()

# %% Dummies
s = df['D']
dummies = pd.get_dummies(s.apply(pd.Series).stack()).sum(level=0)
df = pd.concat([df, dummies], axis=1)

df.head()

# %% Summary
summary = df.groupby(['B_binned', 'Cx']).agg({'One': 'sum', 'Two': 'sum'})

summary.reset_index(inplace=True)
summary = pd.melt(summary,
                  id_vars=['B_binned', 'Cx'],
                  value_vars=['One', 'Two'],
                  var_name='D',
                  value_name='count')
summary.sort_values(['B_binned', 'D', 'Cx'], inplace=True)
summary.set_index(['B_binned', 'D', 'Cx'], inplace=True)
summary

# %% Chart
summary.unstack().unstack().plot(kind='bar')

Answer 3

脾气暴躁

使用numpy数组进行计数，然后构造declare module '*.vue' { import Vue from 'vue'; export default Vue; } declare module 'vue-simple-uploader';进行绘制

DataFrame

Answer 4

尝试了另一种方法。

df['Bins'] = np.where(df['B'].isin([0]), '0',
            np.where(df['B'].isin(range(1,11)), '1-10',
            np.where(df['B'].isin(range(11,21)), '11-20',
            np.where(df['B'].isin(range(21,31)), '21-30',
            np.where(df['B'].isin(range(31,40)), '31-40','41+')
            ))))

df['Class_type'] = np.where(((df['C']>50) & (df['D']== 'One') ), 'C is GTE50 and D is One',
            np.where(((df['C']>50) & (df['D']== 'Two')) , 'C is GTE50 and D is Two',
            np.where(((df['C']<50) & (df['D']== 'One') ), 'C is LT50 and D is One',
                     'C is LT50 and D is Two')
            ))


df.groupby(['Bins', 'Class_type'])['C'].sum().unstack().plot(kind='bar')
plt.show()

#### Output ####

警告：不确定解决方案的最佳方案。它还会占用额外的空间，因此可能会增加空间复杂性。

熊猫：帮助转换数据并编写更好的代码

数据

问题

4 个答案:

脾气暴躁