向量化函数可以将DataFrame用作参数之一吗?

时间:2018-07-23 16:19:55

标签: python pandas

我正在尝试从变更日志数据框中提取价格信息。我使用“ iterrows”使它正常工作。我想知道是否将其转换为矢量化函数会提高效率。我迈出了第一步,但失败了。我不确定矢量化函数是否将DataFrame作为参数。这是我的测试代码:

import pandas as pd
import numpy as np

def getvfromchgloG(CLdf, skulist, SOdate, fromcoL, tocoL):
    sCLdf = CLdf.loc[CLdf['item'] == skulist]
    sCLdf.reset_index(drop=True, inplace=True)
    res = 0
    SearchLength = sCLdf.shape[0]
    if SearchLength > 0:
        if SOdate <= sCLdf.iloc[0]['Editdate']:
            res = sCLdf.iloc[0][fromcoL]
        else:
            for i, r in sCLdf.iterrows():
                if SOdate > r['Editdate']:
                    if i == SearchLength - 1:
                        res = r[tocoL]
                        break
                    else:
                        if SOdate <= sCLdf.iloc[i + 1]['Editdate']:
                            res = r[tocoL]
                            break
    return res


d1 = {'ItemNumber': {0: '33-150-161', 1: '33-150-161', 2: '20-232-530', 3: '24-236-856', 4: '34-261-888'}, 'SODate_x': {0: '4/6/2017 10:20', 1: '4/6/2017 11:35', 2: '6/5/2017 12:42', 3: '3/7/2018 7:35', 4: '8/18/2017 14:25'}}
d2 = {'item': {0: '20-232-530', 1: '20-232-530', 2: '20-232-530', 3: '20-232-530', 4: '20-232-530', 5: '20-232-530', 6: '20-232-530', 7: '20-232-530', 8: '20-232-530', 9: '20-232-530', 10: '20-232-530', 11: '20-232-530', 12: '20-232-530', 13: '20-232-530', 14: '20-232-530', 15: '20-232-530', 16: '20-232-530', 17: '20-232-530', 18: '20-232-530', 19: '20-232-530', 20: '20-232-530', 21: '20-232-530', 22: '20-232-530', 23: '24-236-856', 24: '24-236-856', 25: '33-150-161'}, 'Unitprice changed from': {0: 184.0, 1: 174.0, 2: 184.0, 3: 185.0, 4: 187.0, 5: 184.0, 6: 187.0, 7: 190.0, 8: 187.0, 9: 190.0, 10: 188.0, 11: 190.0, 12: 188.0, 13: 191.0, 14: 190.0, 15: 191.0, 16: 195.0, 17: 210.0, 18: 228.0, 19: 260.0, 20: 234.0, 21: 240.0, 22: 245.0, 23: 99999.0, 24: 699.0, 25: 1005.0}, 'Unitprice changed to': {0: 174.0, 1: 184.0, 2: 185.0, 3: 187.0, 4: 184.0, 5: 187.0, 6: 190.0, 7: 187.0, 8: 190.0, 9: 188.0, 10: 190.0, 11: 188.0, 12: 191.0, 13: 190.0, 14: 191.0, 15: 195.0, 16: 210.0, 17: 228.0, 18: 260.0, 19: 234.0, 20: 240.0, 21: 245.0, 22: 250.0, 23: 699.0, 24: 700.0, 25: 1033.0}, 'Editdate': {0: '2017-04-11 00:05:31.247', 1: '2017-04-18 00:10:04.540', 2: '2017-04-19 15:00:01.403', 3: '2017-04-19 15:00:01.407', 4: '2017-05-24 11:10:09.373', 5: '2017-06-19 09:30:00.987', 6: '2017-07-19 17:05:08.580', 7: '2017-07-20 00:05:03.650', 8: '2017-07-21 00:05:02.890', 9: '2017-09-05 13:20:06.463', 10: '2017-09-07 11:00:38.330', 11: '2017-09-12 11:15:05.730', 12: '2017-09-18 15:00:00.953', 13: '2017-09-19 00:05:14.370', 14: '2017-09-26 00:05:17.383', 15: '2017-10-24 14:45:01.817', 16: '2017-10-24 14:45:01.850', 17: '2017-10-30 11:00:15.860', 18: '2017-12-18 09:40:05.920', 19: '2017-12-19 09:20:37.103', 20: '2017-12-21 09:30:03.420', 21: '2017-12-21 09:30:03.490', 22: '2017-12-21 09:30:03.590', 23: '2017-11-16 09:24:59.880', 24: '2017-11-16 09:25:00.077', 25: '2017-08-03 17:05:10.333'}}
FromPrice = "Unitprice changed from"
ToPrice = "Unitprice changed to"
SODatetimeCol = 'SODate_x'

SOdf = pd.DataFrame(d1)
Logdf = pd.DataFrame(d2)
SOdf["ChgPrice"] = 0


for ind, row in SOdf.iterrows():
    searchsku = row['ItemNumber']
    SOdate = row[SODatetimeCol]
    SOdf["ChgPrice"].iloc[ind] = getvfromchgloG(Logdf, searchsku, SOdate, FromPrice, ToPrice)


print (SOdf)


# The code below won't work
SOdf["ChgPrice"] = np.vectorize(getvfromchgloG, otypes=[object])(
    Logdf,
    SOdf['ItemNumber'],
    SOdf[SODatetimeCol],
    FromPrice,
    ToPrice)

0 个答案:

没有答案