我正在尝试运行以下代码:
import pandas as pd
import numpy as np
df = pd.read_csv('E:/test.csv', low_memory=False)
mat = df.as_matrix(columns ['pageTitle','deviceCategory','eventCategory','eventAction'])
values, counts = np.unique(mat.astype(str), return_counts=True)
for x in values:
df[x]=df.isin([x]).any(1).astype(int)
grouped = df.groupby('Session_ID')
grouped.sum().to_csv('E:/test2.csv')
但是我收到以下错误:
追踪(最近一次呼叫最后一次):
File "C:\Users\User\Desktop\flat_seminar.py", line 5, in <module>
values, counts = np.unique(mat.astype(str), return_counts=True)
ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.
我尝试使用memmap
,但memmap
不支持as_matrix
个功能。