我想使用Python在Pandas数据框中搜索整个列“仓库”,如果单元格值出现3次以上,我想将相同的值写入GeneralDescription列。我正在尝试编写适用于数千行的代码,并忽略大小写值。这是我的代码,试图完成此操作,该代码仅输出出现3次以上的值,而不向GeneralDescription列写入任何内容。我究竟做错了什么?任何帮助将不胜感激。
import pandas as pd
from collections import Counter
import numpy as np
data= [[2,'Empty','Empty'],[3,'General Liability','Empty'],[4,'WRS','Empty'],[5,'WRS','Empty'],[6,'CENTRAL','Empty'],[7,'General Liability','Empty'],[8,'CENTRAL','Empty'],[9,'wrs','Empty'],[10,'WRS','Empty'],[11,'GENERAL LIABILITY','Empty'],[12,'General Liability','Empty']]
df1=pd.DataFrame(data,columns=['LineNum','Warehouse','GeneralDescription'])
vc=df1.Warehouse.value_counts()
#print (vc[vc>3].index[0])
counts=Counter(df1.Warehouse.str.lower())
df1[df1.Warehouse.str.lower().isin([key for key in counts if counts[key]>3])].fillna(df1['GeneralDescription'])
df1
LineNum Warehouse GeneralDescription
0 2 Empty Empty
1 3 General Liability Empty
2 4 WRS Empty
3 5 WRS Empty
4 6 CENTRAL Empty
5 7 General Liability Empty
6 8 CENTRAL Empty
7 9 wrs Empty
8 10 WRS Empty
9 11 GENERAL LIABILITY Empty
10 12 General Liability Empty
df2所需结果
LineNum Warehouse GeneralDescription
0 2
1 3 General Liability General Liability
2 4 WRS WRS
3 5 WRS WRS
4 6 CENTRAL
5 7 General Liability General Liability
6 8 CENTRAL
7 9 wrs WRS
8 10 WRS WRS
9 11 GENERAL LIABILITY General Liability
10 12 General Liability General Liability
答案 0 :(得分:3)
您可以使用str.title
通过大小写对列进行规范化,然后使用value_counts
+ map
创建掩码。
i = df1.Warehouse.replace('Empty', np.nan).str.title()
df1['GeneralDescription'] = df1.Warehouse.where(i.map(i.value_counts()).gt(3))
print(df1)
LineNum Warehouse GeneralDescription
0 2 Empty NaN
1 3 General Liability General Liability
2 4 WRS WRS
3 5 WRS WRS
4 6 CENTRAL NaN
5 7 General Liability General Liability
6 8 CENTRAL NaN
7 9 wrs wrs
8 10 WRS WRS
9 11 GENERAL LIABILITY GENERAL LIABILITY
10 12 General Liability General Liability
答案 1 :(得分:3)
您可以将pd.Series.value_counts
与pd.DataFrame.loc
一起使用。我们可以使用pd.Series.str.lower
对齐相似的字符串。
wh_lower = df['Warehouse'].str.lower()
counts = wh_lower.value_counts()
df.loc[wh_lower.map(counts) > 3, 'GeneralDescription'] = df['Warehouse']
print(df)
LineNum Warehouse GeneralDescription
0 2 Empty
1 3 GeneralLiability GeneralLiability
2 4 WRS WRS
3 5 WRS WRS
4 6 CENTRAL
5 7 GeneralLiability GeneralLiability
6 8 CENTRAL
7 9 wrs wrs
8 10 WRS WRS
9 11 GENERALLIABILITY GENERALLIABILITY
10 12 GeneralLiability GeneralLiability
答案 2 :(得分:3)
您可以使用transform
df.Warehouse=df.Warehouse.str.upper()
df.loc[df.groupby('Warehouse').Warehouse.transform('count').gt(3),'GeneralDescription']=df.Warehouse
df
Out[356]:
LineNum Warehouse GeneralDescription
0 2 EMPTY Empty
1 3 GENERALLIABILITY GENERALLIABILITY
2 4 WRS WRS
3 5 WRS WRS
4 6 CENTRAL Empty
5 7 GENERALLIABILITY GENERALLIABILITY
6 8 CENTRAL Empty
7 9 WRS WRS
8 10 WRS WRS
9 11 GENERALLIABILITY GENERALLIABILITY
10 12 GENERALLIABILITY GENERALLIABILITY