我有一个我想在pandas数据帧中搜索的单位列表,然后将这些单位转换为正确命名的单位,并将它们的值乘以下面列表中的常数因子。这是示例数据框
>> df
product info
product___A 3.5 m mini-jack
product___B 3.5 kg mini-jack
product___C 3.5mm mini-jack
product___D 3.5 millimeter mini-jack
product___E 43 centimeter mini-jack
这是我的代码实现
import re
import pandas as pd
units_origianal = ['Kilogram', 'millimeter', 'pounds', 'ounce', 'centimeter', 'kilometers']
units = ['kg', 'mm', 'lbs' 'oz', 'cm', 'm']
factor = [0.543, 654.53, 53.64,0.744, 43.8, 98.123]
def norm_units(x):
for i in range(len(units)):
if ('\d+\s'+units_origianal[i] in x or re.search('\d+'+units_origianal[i],str(x))):
quantity = re.findall("\d+\.\d+", str(x))[0]
resulting_quantity = float(quantity) * factor[i]
return x.replace(quantity, resulting_quantity).replace(units_origianal[i], units[i])
df = df.apply(norm_units)
>> df
# Expected resulting Dataframe
product info
product___A 344.05 m mini-jack
product___B 1.9005 kg mini-jack
product___C 2290.155 mm mini-jack
product___D 2290.155 mm mini-jack
product___E 1883.4 cm mini-jack
运行代码后得到的结果数据框
product info
0 None None
1 None None
2 None None
3 None None
4 None None
赞赏并提前感谢您的帮助。
答案 0 :(得分:2)
您可能希望将str.replace
与正则表达式组
>> factors = {'Kilogram': 0.543, 'kg': 0.543,
'millimeter': 654.54, 'mm': 654.54,
'pounds': 53.64, 'lbs': 53.64,
'ounce': 0.744, 'oz': 0.744,
'centimeter': 43.8, 'cm': 43.8,
'kilometers': 98.123, 'm': 98.123}
>> pat = "(?P<val>\d+\.?\d?)\s*(?P<unit>(%s))" % '|'.join(factors)
>> def repl(p):
>> val, unit = float(p.group('val')), p.group('unit')
>> return str(factors[unit] * val) + ' ' + unit
>> df['info'] = df['info'].str.replace(pat, repl)
>> df
product info
0 product___A 343.4305 m mini-jack
1 product___B 1.9005 kg mini-jack
2 product___C 2290.89 mm mini-jack
3 product___D 2290.89 millimeter mini-jack
4 product___E 1883.3999999999999 centimeter mini-jack