数据框列中两个值的平均值

时间:2020-02-26 16:33:21

标签: python pandas

我正在尝试获取Price的平均值

import pandas as pd
df = pd.read_csv('sample_data1.csv')

#file示例

Name,Price
Eedor,"¥1,680"
Avidlove,"¥11,761"
Fitment,
Vintage,$8.95 - $16.95
silhouette,$27.80 - $69.50
Silk,$50.02

我试图在“价格”列中获得平均值,然后如果将日元转换为美元,我已经编写了这个小函数,可以完成这项工作,我不确定如何将其应用于列

import re
#1¥ =0.0090$
def my_func(value):
    if not value:
        return None #remove row
    elif "¥" in value:
        try:
            temp = re.search(r'(\d+\,*\.*\d*) - .(\d+\,*\.*\d*)',value).groups()
            return (float(temp[0].replace(',',''))+float(temp[1].replace(',','')))*0.09/2
        except:
            return float(re.search(r'(\d+\,*\.*\d*)',value).groups()[0].replace(',',''))*0.009
    else:
        try:
            temp = re.search(r'(\d+\,*\.*\d*) - .(\d+\,*\.*\d*)',value).groups()
            return (temp[0]+temp[1])/2
        except:
            return float(re.search(r'(\d+\,*\.*\d*)',value).groups()[0].replace(',',''))

我想要的是用$的平均值替换价格列

1 个答案:

答案 0 :(得分:0)

这可以做您想要的,没有货币符号:

df['average'] = df.Price.str.replace(',','').str.extractall('([\d\.]+)').astype(float)[0].mean(level=0)

输出:

         Name            Price   average
0       Eedor           ¥1,680   1680.00
1    Avidlove          ¥11,761  11761.00
2     Fitment              NaN       NaN
3     Vintage   $8.95 - $16.95     12.95
4  silhouette  $27.80 - $69.50     48.65
5        Silk           $50.02     50.02

要纠正日元汇率:

df['average'] = np.where(df.Price.str[:1].eq('¥'), 
                         df['average']*Yen_to_USD_rate, 
                         df['average'])