我正在合并大型数据帧并且出现内存错误。我想知道这个错误是因为编码不好还是因为数据帧很大。数据帧是dfy:34.7 Mb和df:2.2 Mb。
dfy = pd.read_csv('Thesis/CRSP/CampaignFin14/pacs14.txt', header=None, \
names=['cycle', '2', '3', 'cid', 'amount', 'date', 'catcode', 'type', 'di', 'feccandid'], \
usecols=['cycle', 'cid', 'amount', 'date', 'catcode', 'type', 'di', 'feccandid'])
dfy.head()
cycle cid amount date catcode type di feccandid
0 2014 N00029285 1000 05/15/2014 E1600 24K D H8TX22107
1 2014 N00026722 5000 10/22/2013 G4600 24K D H4TX28046
2 2014 N00030676 4 03/26/2014 C2100 24Z D H0MO07113
3 2014 N00032088 1000 05/06/2014 F1100 24K D H0OH06189
df = pd.read_csv('Thesis/MapLight_data/mpl_data114.csv', header=None, names=\
['session', 'prefix', 'number', 'organization_id', 'name', 'disposition', 'catcode'], usecols=\
['session', 'prefix', 'number', 'disposition', 'catcode'])
df.head()
session prefix number disposition catcode
0 114 H 131 support J6200
1 114 H 138 oppose L1100
2 114 H 140 support NaN
df_merge = pd.merge(dfy, df, on='catcode')