在Python中格式化groupby输出

时间:2017-07-26 21:49:55

标签: python pandas

我有一个看起来像这样的DataFrame:

import pandas as pd

df = pd.DataFrame(columns=['date', 'type', 'version'],
                  data=[
                      ['2017-07-01', 'critical::issue::A', 'version1'],
                      ['2017-07-01', 'critical::issue::A', 'version2'],
                      ['2017-07-01', 'hardware::issue::B', 'version1'],
                  ])

我使用以下内容返回'type'的所有唯一值的大小;

sub_cat = ['critical::',
           'hardware::',
           'software::'
           ]

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]

    count = x.groupby('type').size()
    if len(count) > 0:
        print(count)
    else:
        print(cat, '0')

结果是正确的,但输出是草率的:

type
critical::issue::A    2
dtype: int64
type
hardware::issue::B    1
dtype: int64
  software:: 0

我想格式化输出,使其更具可读性,如下例所示。

type
critical::issue::A    2
hardware::issue::B    1
software:: 0

有什么建议吗?

4 个答案:

答案 0 :(得分:1)

另一种解决方案,如果您只是改变:

print(count)

要:

print(count.to_string(header=False))

你得到:

critical::issue::A    2
hardware::issue::B    1
software:: 0

所以也许在循环之前添加一个打印(“类型”)并且你在那里?

答案 1 :(得分:0)

您可以循环遍历count groupby变量的行,以逐行输出第1行:

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]
    count = x.groupby('type').size()
    if len(count) > 0:
        for ind, row in count.iteritems():
            print(ind, row)
    else:
        print(cat, '0')

输出如下:

critical::issue::A 2
hardware::issue::B 1
software:: 0

答案 2 :(得分:0)

以下是包含建议更改的代码:

class MyViewHolder extends RecyclerView.ViewHolder {
    private MyData myData;
    // more fields here

    MyViewHolder(MyData myData) {
        this.myData = myData;
        // more assignments here
    }

    public MyData getMyData() {
        return myData;
    }
}

View view = layoutManager.findViewByPosition(whateverPositionYouWant);
if (view != null) {
    final MyViewHolder myViewHolder = (MyViewHolder) view.getTag();
    MyData data = myViewHolder.getMyData();
}

它产生:

import pandas as pd

df = pd.DataFrame(columns=['date', 'type', 'version'],
                  data=[
                      ['2017-07-01', 'critical::issue::A', 'version1'],
                      ['2017-07-01', 'critical::issue::A', 'version2'],
                      ['2017-07-02', 'critical::issue::B', 'version3'],
                      ['2017-07-01', 'hardware::issue::B', 'version1'],
                  ])  

sub_cat = ['critical::',
           'hardware::',
           'software::']

print("type")

for cat in sub_cat:
    x = df[df.type.str.startswith(cat)]

    count = x.groupby('type').size()

    # 'count' is a Series object
    for i in range(len(count)):
        print("{}\t{}".format(count.index[i], count[i]))

    if len(count) == 0:
        print("{}\t{}".format(cat, 0)) 

答案 3 :(得分:0)

考虑一下这个熊猫的方法:

In [79]: res = df.groupby('type').size()

In [80]: res
Out[80]:
type
critical::issue::A    2
hardware::issue::B    1
dtype: int64

In [81]: s = pd.Series(sub_cat)

In [82]: idx = s[~s.isin(df.type.str.extract(r'(\w+::)', expand=False).unique())].values

In [83]: res = res.append(pd.Series([0] * len(idx), index=idx))

In [84]: res
Out[84]:
critical::issue::A    2
hardware::issue::B    1
software::            0
dtype: int64