Question

所以，我正在写一个巨大的模块，其中我正在调用其他10个模块。这些“其他10个模块”将ref数据存储为列表列表。

例如，我有一个模块refdataCollection.py，它包含这些数据，每个数据都不超过100个。

allFees = [['FeeID','RegFees',''HXVFees'],
['ABC',34,21],
['ABV',31,23],
['PGC',33,25],]

allCust = [['CustID','CustCode',''CustName'],
['1','ARN','Company 1'],
['2','BRS','Company 2'],
['3','AJN','Company 3'],]

通过我的主要模块的课程，我打电话给这些模块：

import pandas as pd
import refdataCollection as refdata

def getRefData(refDataName):
    return getattr(refdata,refDataName)

def getRefDataDataFrame(refDataName):
    records = getRefData(refDataName)
    headers = records[0]
    refDataDF = pd.DataFrame(records[1:],columns=headers)
    return refDataDF

然后在主模块本身，我可以得到这样的值：

feesDataFrame = getRefDataDataFrame('allFees')
thisFee = feesDataFrame[ (feesDataFrame['FeeID'] == 'ABC')]

考虑到我有多种方法调用多个这样的ref数据，我可能会这样做100次。

问题是：是使用数据帧正确的方法吗？对于这么小的数据集，我最好使用列表或dict而不是pandas吗？我的方法花费了很多时间，几秒钟就可以通过一组50K记录循环。

Python Pandas Dataframe vs dict vs list

0 个答案: