在python中读取csv文件两次

时间:2017-09-06 20:06:11

标签: python python-2.7 csv

这是我的Python代码:

import csv

# Reading
ordersFile = open('orders.csv', 'rb')
ordersR = csv.reader(ordersFile, delimiter=',')

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in ordersR:
    if order[2] == '5' and order[13] == 'Brazil':
        print order
# Find order employeeID=5
print "Find order employeeID=5"
for order in ordersR:
    if order[2] == '5':
        print order
ordersFile.close()

我可以打印一些“#Find order employeeID = 5,shipCountry =”Brazil“”,但是我找不到#Find order employeeID = 5。我在考虑如何在同一个csv文件中多次读取(选择)行。

6 个答案:

答案 0 :(得分:4)

您只是正在阅读CSV文件,但如果您想要多次处理数据,则应将内容读入变量。然后,每次需要使用它时都不必重新读取文件。

import csv

# Read order rows into our list
# Here I use a context manager so that the file is automatically
# closed upon exit
with open('orders.csv') as orders_file:
    reader = csv.reader(orders_file, delimiter=',')
    orders = list(reader)

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in orders:
    if order[2] == '5' and order[13] == 'Brazil':
        print order

# Find order employeeID=5
print "Find order employeeID=5"
for order in orders:
    if order[2] == '5':
        print order

如果您的CSV文件太大而无法放入内存(或者您不想因为某种原因将其全部读入内存),那么您将需要一种不同的方法。如果您需要,请发表评论。

答案 1 :(得分:1)

您可以做的只是将阅读器对象结果转换为列表:

with open('orders.csv', 'rb') as ordersFile:
    ordersR = list(csv.reader(ordersFile, delimiter=','))

读取器对象就像一个生成器,一旦迭代了值,就无法再开始第二次循环来读取值。

答案 2 :(得分:1)

如果您不想将所有数据存储在列表中,这是一种基于生成器的纯方法,可以两次迭代csv文件。使用itertools.tee

with open('orders.csv', 'r') as file:
    rows0, rows1 = tee(reader(file, delimiter=','))

    for row in rows0:
        print(row)  # search for something...

    print()

    for row in rows1:
        print(row)  # search for a different thing...

答案 3 :(得分:0)

最好通过文件一次来阅读,因为I / O可能是程序中最慢的部分。

如果您需要重新阅读该文件,可以将其关闭并重新打开,或者seek()开头,即在您的循环之间添加ordersFile.seek(0)

答案 4 :(得分:0)

这是使用pandas模块的好例子(您需要安装它:pip install pandas

之后,您只需阅读一次文件,轻松执行任何类型的装配

例如,要多次读取和过滤文件,请按照以下示例进行操作:

import pandas as pd 

# read csv into a dataframe 
df = pd.read_csv('orders.csv', delimiter=',') 

# get the data that has employeeID == 5
df1 = df[df["employeeID"] == 5]
print(df1) 

# get the data that has employeeID == 5 and  shipCountry=\"Brazil\"

df2 = df[(df["employeeID"] == 5)& (df["shipCountry"] == "Brazil")]
print(df2) 

答案 5 :(得分:0)

正如上面提到的@ Nick T ,与RAM访问相比,I / O被认为是昂贵的,因此如果您需要多次迭代文件,最好将其保存到变量

您还可以在单​​个for循环中组合多个条件,因此它执行速度更快(单次迭代):

with open('orders.csv', 'rb') as ordersFile:
    orders = list(csv.reader(ordersFile, delimiter=','))

# Find order employeeID=5, shipCountry="Brazil"
emp = []
country = []
for order in orders:
    if order[2] == '5':
        if order[13] == 'Brazil':
            country.append(order)
        else:
            emp.append(order)

 print 'emp id=5 and shippingcountry=Brazil: {}'.format(country)
 print 'emp id=5: {}'.format(emp)

请注意,这不是可扩展的,您可能不希望在此块中添加任何更多if逻辑,因为它变得不可读