我有两个数据帧,如下所示,并且我正在尝试根据数据帧2的ITEM / CODE将值设置为数据帧1中的CLASSIFICATION。如果DESC对ITEM / CODE有任何匹配的单词,那么我必须从dataframe2获取TYPE。
要实现此目的,我拆分了DESC字符串,并尝试将列表条目与dataframe2的ITEM / CODE比较。有关如何执行此操作的任何想法?
dataframe1
PN DESC CLASSIFICATION
C23890 Resistor 2.21K elec
C23891 Powerswitch
C23892 Resistor 7.5K
C23893 Resistor .1K
C23894 FET elec
C23895 ELE SD Card adapter
C23896 Crystal 16Mhz
C23897 Capacitor 100uF
C23898 ELECTRONICS Resistor 10K
C23899 M3x5 Socket Cap Bolt MECH
C23900 M3x6 Socket Cap Bolt Mech
C23901 Mehcanical Assemble Kapton Tape 120mm
C23902 MK7 Filament Drive Block Front
C23903 Pulley 5mm shaft
dataframe2
ITEM/CODE TYPE
ELE ELECTRONIC
ELECTRONICS ELECTRONIC
Capacitor ELECTRONIC
Resistor ELECTRONIC
Washer MECHANICAL
MECH MECHANICAL
这是我到目前为止编写的代码。
import pandas as pd
fn = 'D:\PartsExport.xlsx'
dfInput = pd.read_excel(fn, 'Sheet1')
fn_type = 'D:\TypeMaster.xlsx'
dfType = pd.read_excel(fn_type, 'Sheet1')
dfInput['DESC_SPLIT'] = dfInput["DESC"].str.split(" ", n=-1, expand = False)
Result
PN DESC CLASSIFICATION
C23890 Resistor 2.21K elec ELECTRONIC
C23891 Powerswitch ELECTRONIC
C23892 Resistor 7.5K ELECTRONIC
C23893 Resistor .1K ELECTRONIC
C23899 M3x5 Socket Cap Bolt MECH MECHANICAL
答案 0 :(得分:1)
将Series.str.contains
与dataframe2
创建的按系列循环一起使用,其中flags=re.I
参数用于不区分大小写:
import re
for k, v in dataframe2.set_index('ITEM/CODE')['TYPE'].items():
#if necessary word boundaries
pat = r"\b{}\b".format(k)
#if not
#pat = k
dataframe1.loc[dataframe1['DESC'].str.contains(pat, flags=re.I), 'CLASSIFICATION'] = v
print (dataframe1)
PN DESC CLASSIFICATION
0 C23890 Resistor 2.21K elec ELECTRONIC
1 C23891 Powerswitch NaN
2 C23892 Resistor 7.5K ELECTRONIC
3 C23893 Resistor .1K ELECTRONIC
4 C23894 FET elec NaN
5 C23895 ELE SD Card adapter ELECTRONIC
6 C23896 Crystal 16Mhz NaN
7 C23897 Capacitor 100uF ELECTRONIC
8 C23898 ELECTRONICS Resistor 10K ELECTRONIC
9 C23899 M3x5 Socket Cap Bolt MECH MECHANICAL
10 C23900 M3x6 Socket Cap Bolt Mech MECHANICAL
11 C23901 Mehcanical Assemble Kapton Tape 120mm NaN
12 C23902 MK7 Filament Drive Block Front NaN
13 C23903 Pulley 5mm shaft NaN
如果只想匹配第一个单词,请使用Series.map
,但首先将两个值都用Series.str.lower
转换为小写:
dataframe2['ITEM/CODE'] = dataframe2['ITEM/CODE'].str.lower()
s = dataframe2.set_index('ITEM/CODE')['TYPE']
dataframe1['CLASSIFICATION'] = dataframe1['DESC'].str.split().str[0].str.lower().map(s)
print (dataframe1)
PN DESC CLASSIFICATION
0 C23890 Resistor 2.21K elec ELECTRONIC
1 C23891 Powerswitch NaN
2 C23892 Resistor 7.5K ELECTRONIC
3 C23893 Resistor .1K ELECTRONIC
4 C23894 FET elec NaN
5 C23895 ELE SD Card adapter ELECTRONIC
6 C23896 Crystal 16Mhz NaN
7 C23897 Capacitor 100uF ELECTRONIC
8 C23898 ELECTRONICS Resistor 10K ELECTRONIC
9 C23899 M3x5 Socket Cap Bolt MECH NaN
10 C23900 M3x6 Socket Cap Bolt Mech NaN
11 C23901 Mehcanical Assemble Kapton Tape 120mm NaN
12 C23902 MK7 Filament Drive Block Front NaN
13 C23903 Pulley 5mm shaft NaN
答案 1 :(得分:1)
不那么花哨,但应该去做:
import pandas as pd
#convert dfType dataframe to dictionary
type_dict = dfType.set_index('ITEM/CODE').T.to_dict()
#function that takes in DESC column value and outputs corresponding value from type_dict
def map_type(in_str):
out_str = np.NaN
for val in in_str.split():
if val in type_dict.keys():
out_str = type_dict[val]['TYPE']
return out_str
#apply above function to DESC column
dfInput['CLASSIFICATION'] = dfInput['DESC'].apply(map_type)