Pandas value_counts()for loop失败为lambda

时间:2015-06-02 21:24:11

标签: python pandas lambda

我有一些三个变量的数据框,我想为每个变量创建一个每个标签相对计数的字典。

我轻松创建了一个完全输出我想要的forloop,但是我的lambda会产生奇怪的结果。

以下是数据:

<body>
    <div class="wrapper">
        <div class="page">
            <ul id="nav" class="sf-menu">
                <li class="level0 level-top parent first"><a href="#" target="_self" class=" level-top "><span>Category 01</span></a>
                    <ul class="level0   ">	
                        <li class="level1 first  "><a href="#" target="_self" class=""><span>cabelos</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>olhos</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>lábios</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>corpo</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>pescoço</span></a></li>
                        <li class="level1 first last  last-col"><a href="#" target="_self" class=""><span>unhas</span></a></li>
                    </ul>
                </li>
                <li class="level0 level-top parent mega-pos-01"><a href="#" target="_self" class=" level-top "><span>Category 02</span></a>
                    <ul class="level0 megamenu mega-wFull mega-col6">
                        <li class="level1 parent first  "><a href="#" target="_self" class=" "><span>subcategoria 01</span></a>
                            <ul class="level1   ">
                                <li class="level2 first last  "><a href="#" target="_self" class=""><span>face</span></a></li>
                            </ul>
            
                        </li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 41</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 02</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 03</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 04</span></a></li>
                        <li class="level1 parent  last-col"><a href="#" target="_self" class=" "><span>subcategoria 05</span></a>
                            <ul class="level1   ">	
                                <li class="level2 parent first  "><a href="#" target="_self" class=" "><span>subcategoria 37</span></a>
                                    <ul class="level2   ">	
                                        <li class="level3 first  "><a href="#" target="_self" class=""><span>subcategoria 34</span></a></li>
                                        <li class="level3 first last  "><a href="#" target="_self" class=""><span>subcategoria 33</span></a></li>
                                    </ul>
            
                                </li>
                                <li class="level2 parent first last  "><a href="#" target="_self" class=" "><span>subcategoria 38</span></a>
                                    <ul class="level2   ">	
                                        <li class="level3 first  "><a href="#" target="_self" class=""><span>subcategoria 36</span></a></li>
                                        <li class="level3 first last  "><a href="#" target="_self" class=""><span>subcategoria 35</span></a></li>
                                    </ul>
            
                                </li>
                            </ul>
                        </li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 06</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 07</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 08</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 09</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 10</span></a></li>
                        <li class="level1  last-col"><a href="#" target="_self" class=""><span>subcategoria 11</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 12</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 13</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 14</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 15</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 16</span></a></li>
                        <li class="level1  last-col"><a href="#" target="_self" class=""><span>subcategoria 17</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 18</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 19</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 20</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 21</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 22</span></a></li>
                        <li class="level1  last-col"><a href="#" target="_self" class=""><span>subcategoria 23</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 24</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 25</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 26</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 27</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 28</span></a></li>
                        <li class="level1  last-col"><a href="#" target="_self" class=""><span>subcategoria 29</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 30</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 31</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>subcategoria 32</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>feminino</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>masculino</span></a></li>
                        <li class="level1 first last  last-col"><a href="#" target="_self" class=""><span>desodorante</span></a></li>
                    </ul>
                </li>
                <li class="level0 level-top  "><a href="#" target="_top" class=" level-top"><span>Category 03</span></a></li>
                <li class="level0 level-top  "><a href="#" target="_top" class=" level-top"><span>Category 04</span></a></li>
                <li class="level0 level-top parent last  "><a href="#" target="_self" class=" level-top "><span>Category 05</span></a>
                    <ul class="level0   ">	
                        <li class="level1 first  "><a href="#" target="_self" class=""><span>outros</span></a></li>
                        <li class="level1  "><a href="#" target="_self" class=""><span>gel de banho</span></a></li>
                        <li class="level1 parent first last  "><a href="#" target="_self" class=" "><span>loção corporal</span></a>
                            <ul class="level1   ">	
                                <li class="level2 first  "><a href="#" target="_self" class=""><span>subcategoria 40</span></a></li>
                                <li class="level2 first last  "><a href="#" target="_self" class=""><span>subcategoria 39</span></a></li>
                            </ul>
                        </li>
                    </ul>
                </li>
            </ul>
        </div>
    </div>     
</body>

这个for循环产生我想要的确切输出:

In [3]:

import pandas as pd
raw_data = {
    'category1': ['Red', 'Red', 'Red', 'Green'],
    'category2': ['Plane', 'Plane', 'Plane', 'Car'],
    'category3': ['Orange', 'Orange', 'Orange', 'Banana'],
    }
df = pd.DataFrame(raw_data)
df
Out[3]:
category1   category2   category3
0   Red Plane   Orange
1   Red Plane   Orange
2   Red Plane   Orange
3   Green   Car Banana

然而,这个lambda由于某种未知原因而失败:

In [4]:

forloop = {}
for column in df:
    forloop[column] = df[column].value_counts(normalize=True).to_dict()
forloop
Out[4]:
{'category1': {'Green': 0.25, 'Red': 0.75},
 'category2': {'Car': 0.25, 'Plane': 0.75},
 'category3': {'Banana': 0.25, 'Orange': 0.75}}

2 个答案:

答案 0 :(得分:1)

我实际上无法理解这里出了什么问题,除了它没有解开dict电话,这是一个实现你想要的圆形方式:

In [86]:
ratio = lambda x: x.value_counts(normalize=True)
output_lambda = df.apply(lambda x: [x.value_counts().to_dict()]).apply(lambda x: x[0]).to_dict()
output_lambda

Out[86]:
{'category1': {'Green': 1, 'Red': 3},
 'category2': {'Car': 1, 'Plane': 3},
 'category3': {'Banana': 1, 'Orange': 3}}

看起来它将函数对象绑定为列值而不是将其解压缩为dict,我上面所做的是将value_counts作为列表返回,然后再次调用apply解压缩单个元素列表。这会强制将dict解压缩到初始apply调用中的单个元素列表中:

In [87]:
output_lambda = df.apply(lambda x: [x.value_counts().to_dict()])
output_lambda

Out[87]:
category1        [{'Green': 1, 'Red': 3}]
category2        [{'Plane': 3, 'Car': 1}]
category3    [{'Banana': 1, 'Orange': 3}]
dtype: object

答案 1 :(得分:1)

我想问题是lambda函数返回的对象无法通过pandas转换为SeriesDataFrame(但应由pandas专家确认)。

只需略微修改代码即可实现几乎相同的功能:

ratio = lambda x: x.value_counts(normalize=True)
output_lambda = df.apply(ratio).to_dict()

如果您不希望在nan中使用output_lambda,则可以使用此答案中提出的解决方案:https://stackoverflow.com/a/26033302/4709400