根据熊猫中列的唯一值遍历df

时间:2018-08-29 09:34:59

标签: python python-3.x pandas loops dataframe

我试图遍历数据以获取列的唯一值方面的数据。首先,我列出了目标列的唯一值,然后尝试定义一个函数,该函数将为特定的唯一值生成数据。但是,看来我不太成功。您能帮我解决这个问题吗?

Sample Data

uniq_list = df2['Sum of Qtd'].unique().tolist()

def unique_data(uniq):
    unique_list=[]
    for uniq in uniq_list:
        if df2[df2['Sum of Qtd'] == uniq]:
            res_df2 = []
            res_df2 = pd.DataFrame(columns = df2.columns)
            res_df2.append(uniq)
        unique_list.append(res_df2) 


for uniq in uniq_list:
    print(unique_data(uniq)

但是我得到的错误如下,

> ValueError                                Traceback (most recent call
> last) <ipython-input-31-0455c2449f78> in <module>()
>       1 for uniq in uniq_list:
> ----> 2     print(unique_data(uniq))
> 
> <ipython-input-29-75d579a4768f> in unique_data(uniq)
>       2     unique_list=[]
>       3     for uniq in uniq_list:
> ----> 4         if df2[df2['Sum of Qtd'] == uniq]:
>       5             res_df2 = []
>       6             res_df2 = pd.DataFrame(columns = df2.columns)
> 
> ~\Anaconda3\lib\site-packages\pandas\core\generic.py in
> __nonzero__(self)    1571         raise ValueError("The truth value of a {0} is ambiguous. "    1572                          "Use a.empty,
> a.bool(), a.item(), a.any() or a.all()."
> -> 1573                          .format(self.__class__.__name__))    1574     1575     __bool__ = __nonzero__
> 
> ValueError: The truth value of a DataFrame is ambiguous. Use a.empty,
> a.bool(), a.item(), a.any() or a.all().

在进行了一些改进并解决了错误问题后,修改后的代码如下所示,但是我仍然无法基于“ Qtd的总和”列的唯一值创建数据子集。您能给我一些提示,我该怎么做吗?

def unique_data(uniq):
    unique_list=[]
    for i in uniq_list:
        res_df2 = []
        res_df2 = pd.DataFrame(columns = df2.columns)
        if df2.loc[df2['Sum of Qtd'] != uniq].empty:
            res_df2.append(df2)
            Q1 = df2.Sales.quantile(0.25)
            Q3 = df2.Sales.quantile(0.75)
            IQR = Q3 - Q1
            mask = (df2.Sales < (Q1 - 1.5 * IQR)) | (df2.Sales > (Q3 + 1.5 * IQR))
            df2[mask] = np.nan    
        unique_list.append(res_df2)
        print(unique_list)

0 个答案:

没有答案