读取大文件的熊猫read_csv问题

时间:2019-12-10 17:17:44

标签: python pandas numpy csv tensorflow

我正在尝试建立一个项目并创建一个csv,同时包含13347行和2500列,但是在通过熊猫读取文件时,由于系统无法正确构建模型,因此仅读取了初始6600值。请告诉我为什么会这样,如何解决。我在输出中附加了一部分代码。

**Code:**
data=pd.read_csv("train_foo.csv",low_memory=False)
dataset=np.array(data)
print(dataset.shape)
np.random.shuffle(dataset)
x=dataset
y=dataset
x=x[:,1:2501]
y=y[:,0]
**#splitting the data into training and testing set, normalizing the values**
x_train=x[0:12000,:]    # 12000 samples in training set
x_train=x_train/255.     # coverting the pixel values0-255 into 0-1
x_test=x[12001:13345,:] # 1345 samples in testing set
x_test=x_test/255.
y=y.reshape(y.shape[0],1)
y_train=y[0:12000,:]
y_train=y_train.T
y_test=y[12001:13345,:]
y_test=y_test.T
print("no. of training examples:"+str(x_train.shape[0]))
print("no. of test examples:"+str(x_test.shape[0]))
print("x_train shape:"+ str(x_train.shape))
print("x_test shape:"+ str(x_test.shape))
print("y_train shape:"+str(y_train.shape))
print("y_test shape: "+str(y_test.shape))

**output:**
no. of training examples:6672
no. of test examples:0
x_train shape:(6672, 2500)
x_test shape:(0, 2500)
y_train shape:(1, 6672)
y_test shape: (1, 0)

1 个答案:

答案 0 :(得分:0)

您可以尝试按文档中的说明按块读取CSV:

reader = pd.read_csv('tmp.sv', sep='|', chunksize=4)
for chunk in reader:
    print(chunk)