在熊猫列中出现的掩码值小于K次(不区分大小写的比较)

时间:2018-07-15 17:46:54

标签: python string pandas dataframe

我想使用Python在Pandas数据框中搜索整个列“仓库”,如果单元格值出现3次以上,我想将相同的值写入GeneralDescription列。我正在尝试编写适用于数千行的代码,并忽略大小写值。这是我的代码,试图完成此操作,该代码仅输出出现3次以上的值,而不向GeneralDescription列写入任何内容。我究竟做错了什么?任何帮助将不胜感激。

import pandas as pd
from collections import Counter
import numpy as np

data= [[2,'Empty','Empty'],[3,'General Liability','Empty'],[4,'WRS','Empty'],[5,'WRS','Empty'],[6,'CENTRAL','Empty'],[7,'General Liability','Empty'],[8,'CENTRAL','Empty'],[9,'wrs','Empty'],[10,'WRS','Empty'],[11,'GENERAL LIABILITY','Empty'],[12,'General Liability','Empty']]
df1=pd.DataFrame(data,columns=['LineNum','Warehouse','GeneralDescription'])

vc=df1.Warehouse.value_counts()
#print (vc[vc>3].index[0])

counts=Counter(df1.Warehouse.str.lower())
df1[df1.Warehouse.str.lower().isin([key for key in counts if counts[key]>3])].fillna(df1['GeneralDescription']) 

df1

    LineNum Warehouse           GeneralDescription
0   2       Empty               Empty
1   3       General Liability   Empty
2   4       WRS                 Empty
3   5       WRS                 Empty
4   6       CENTRAL             Empty
5   7       General Liability   Empty
6   8       CENTRAL             Empty
7   9       wrs                 Empty
8  10       WRS                 Empty
9  11       GENERAL LIABILITY   Empty
10 12       General Liability   Empty

df2所需结果

      LineNum Warehouse           GeneralDescription
0     2                         
1     3       General Liability   General Liability
2     4       WRS                 WRS
3     5       WRS                 WRS
4     6       CENTRAL             
5     7       General Liability   General Liability
6     8       CENTRAL             
7     9       wrs                 WRS
8    10       WRS                 WRS
9    11       GENERAL LIABILITY   General Liability
10   12       General Liability   General Liability

3 个答案:

答案 0 :(得分:3)

您可以使用str.title通过大小写对列进行规范化,然后使用value_counts + map创建掩码。

i = df1.Warehouse.replace('Empty', np.nan).str.title()
df1['GeneralDescription'] = df1.Warehouse.where(i.map(i.value_counts()).gt(3))

print(df1)
    LineNum          Warehouse GeneralDescription
0         2              Empty                NaN
1         3  General Liability  General Liability
2         4                WRS                WRS
3         5                WRS                WRS
4         6            CENTRAL                NaN
5         7  General Liability  General Liability
6         8            CENTRAL                NaN
7         9                wrs                wrs
8        10                WRS                WRS
9        11  GENERAL LIABILITY  GENERAL LIABILITY
10       12  General Liability  General Liability

答案 1 :(得分:3)

您可以将pd.Series.value_countspd.DataFrame.loc一起使用。我们可以使用pd.Series.str.lower对齐相似的字符串。

wh_lower = df['Warehouse'].str.lower()
counts = wh_lower.value_counts()

df.loc[wh_lower.map(counts) > 3, 'GeneralDescription'] = df['Warehouse']

print(df)

    LineNum         Warehouse GeneralDescription
0         2             Empty                   
1         3  GeneralLiability   GeneralLiability
2         4               WRS                WRS
3         5               WRS                WRS
4         6           CENTRAL                   
5         7  GeneralLiability   GeneralLiability
6         8           CENTRAL                   
7         9               wrs                wrs
8        10               WRS                WRS
9        11  GENERALLIABILITY   GENERALLIABILITY
10       12  GeneralLiability   GeneralLiability

答案 2 :(得分:3)

您可以使用transform

df.Warehouse=df.Warehouse.str.upper()
df.loc[df.groupby('Warehouse').Warehouse.transform('count').gt(3),'GeneralDescription']=df.Warehouse
df
Out[356]: 
    LineNum         Warehouse GeneralDescription
0         2             EMPTY              Empty
1         3  GENERALLIABILITY   GENERALLIABILITY
2         4               WRS                WRS
3         5               WRS                WRS
4         6           CENTRAL              Empty
5         7  GENERALLIABILITY   GENERALLIABILITY
6         8           CENTRAL              Empty
7         9               WRS                WRS
8        10               WRS                WRS
9        11  GENERALLIABILITY   GENERALLIABILITY
10       12  GENERALLIABILITY   GENERALLIABILITY