Question

我有一个大型数据集，其中shape =（184215,82）

82列中的。我只想导入一个选择6，以节省内存，因为我需要内连接并对数据进行一些分析

有没有办法限制在pd.read_table（）上创建的列，还是有办法在事实之后删除不必要的列？（该文件是CSV，没有列标题，我不得不在事后创建列名。

例如，这里是82列的列表：

['COBDate' 'structRefID' 'transactionID' 'tradeID' 'tradeLegID'
 'tradeVersion' 'baseCptyID' 'extCptyID' 'extLongName' 'portfolio'
 'productClass' 'productGroup' 'productType' 'RIC' 'CUSIP' 'ISIN' 'SEDOL'
 'underlyingCurrency' 'foreignCurrency' 'notional' 'notionalCCY' 'quantity'
 'MTM' 'tradeDate' 'startDate' 'expiryDate' 'optExerciseType'
 'settlementDate' 'settlementType' 'payoff' 'buySell' 'strike' 'rate'
 'spread' 'rateType' 'paymentFreq' 'resetFreq' 'modelUsed' 'sentWSS'
 'Multiplier' 'PayoutCCY' 'Comments' 'TraderCode' 'AsnOptionStyle'
 'BarrierDirection' 'BarrierMonitoringFreq' 'DayCountConv'
 'SingleBarrierLevel' 'DownBarrierLevel' 'DownRebateAmount'
 'UpBarrierLevel' 'UpRebateAmount' 'IsOptionOnFwd' 'NDFixingDate'
 'NDFixingPage' 'NDFixingRate' 'PayoutAmount' 'Underlying' 'WSSID'
 'WindowEndDate' 'WindowStartDate' 'InstrumentID' 'EffectiveDate' 'CallPut'
 'IsCallable' 'IsExchTraded' 'IsRepay' 'MutualPutDate' 'OptionExpiryStyle'
 'IndexTerm' 'PremiumSettlementDate' 'PremiumCcy' 'PremiumAmount'
 'ExecutionDateTime' 'FlexIndexFlag' 'NotionalPrincipal' 'r_Premium'
 'cpty_type' 'IBTSSID' 'PackageID' 'Component' 'Schema' 'pandas_index']

我只想将以下6作为例子：

'COBDate' 'baseCptyID' 'extCptyID' 'portfolio' 'strike' 'rate'
 'spread'

Answer 1

对于没有列标题的csv：

pd.read_table(usecols=[0, 1, 2])

其中[0, 1, 2]是必须读取的列号。

如果csv包含列标题，您还可以按名称指定它们：

cols_to_read = ['COBDate', 'baseCptyID', 'extCptyID', 'portfolio', 'strike', 'rate', 'spread']
pd.read_table(usecols=cols_to_read)

限制在pd.read_table（）上导入的列

1 个答案: