我有一个不断增长的文件:
https|webmail.mahindracomviva.com|application/vnd.ms-sync.wbxml|158|POST|203.101.110.171
https|webmail.mahindracomviva.com||0|POST|203.101.110.171
https|webmail.mahindracomviva.com||0|POST|203.101.110.171
https|www.googleapis.com|application/x-protobuf|246|POST|74.125.200.95
https|webmail.mahindracomviva.com|application/vnd.ms-sync.wbxml|140|POST|203.101.110.171
https|webmail.mahindracomviva.com|application/x-protobuf|52|POST|203.101.110.171
https|www.googleapis.com|application/x-protobuf|502|POST|74.125.200.95
https|www.googleapis.com|application/x-protobuf|40|POST|74.125.200.95
但我想用Pandas只阅读最后50行。
答案 0 :(得分:0)
您必须执行以下步骤:
首先找到CSV文件的长度,而不将整个CSV文件加载到内存中。 您必须在read_csv()中使用chunksize。
import pandas as pd
count = 0
for data in pd.read_csv('YourFile.csv',encoding = 'ISO-8859-1',chunksize = 1000):
count += 1 # counting the number of chunks
lastlen = len(data) # finding the length of last chunk
datalength = (count*1000 + lastlen - 1000) # length of total file
第二减去要读取的行数。
rowsdiff = datalen - 300
df = pd.read_csv('YourFile.csv',encoding = 'ISO-8859-1',skiprows = range(1,difrows), nrows = 299)
通过这种方法,您只需要读取最后几行,而无需将整个CSV文件载入ram
答案 1 :(得分:0)
尝试使用pandas tail(),行如下:
filename = "your_file"
last_rows = 3
data = pd.read_csv(filename, header=None, sep = "|")
print(data.tail(last_rows))