这是我的Python代码:
import csv
# Reading
ordersFile = open('orders.csv', 'rb')
ordersR = csv.reader(ordersFile, delimiter=',')
# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in ordersR:
if order[2] == '5' and order[13] == 'Brazil':
print order
# Find order employeeID=5
print "Find order employeeID=5"
for order in ordersR:
if order[2] == '5':
print order
ordersFile.close()
我可以打印一些“#Find order employeeID = 5,shipCountry =”Brazil“”,但是我找不到#Find order employeeID = 5。我在考虑如何在同一个csv文件中多次读取(选择)行。
答案 0 :(得分:4)
您只是正在阅读CSV文件,但如果您想要多次处理数据,则应将内容读入变量。然后,每次需要使用它时都不必重新读取文件。
import csv
# Read order rows into our list
# Here I use a context manager so that the file is automatically
# closed upon exit
with open('orders.csv') as orders_file:
reader = csv.reader(orders_file, delimiter=',')
orders = list(reader)
# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in orders:
if order[2] == '5' and order[13] == 'Brazil':
print order
# Find order employeeID=5
print "Find order employeeID=5"
for order in orders:
if order[2] == '5':
print order
如果您的CSV文件太大而无法放入内存(或者您不想因为某种原因将其全部读入内存),那么您将需要一种不同的方法。如果您需要,请发表评论。
答案 1 :(得分:1)
您可以做的只是将阅读器对象结果转换为列表:
with open('orders.csv', 'rb') as ordersFile:
ordersR = list(csv.reader(ordersFile, delimiter=','))
读取器对象就像一个生成器,一旦迭代了值,就无法再开始第二次循环来读取值。
答案 2 :(得分:1)
如果您不想将所有数据存储在列表中,这是一种基于生成器的纯方法,可以两次迭代csv文件。使用itertools.tee
:
with open('orders.csv', 'r') as file:
rows0, rows1 = tee(reader(file, delimiter=','))
for row in rows0:
print(row) # search for something...
print()
for row in rows1:
print(row) # search for a different thing...
答案 3 :(得分:0)
最好通过文件一次来阅读,因为I / O可能是程序中最慢的部分。
如果您需要重新阅读该文件,可以将其关闭并重新打开,或者seek()
开头,即在您的循环之间添加ordersFile.seek(0)
。
答案 4 :(得分:0)
这是使用pandas模块的好例子(您需要安装它:pip install pandas
)
之后,您只需阅读一次文件,轻松执行任何类型的装配
例如,要多次读取和过滤文件,请按照以下示例进行操作:import pandas as pd
# read csv into a dataframe
df = pd.read_csv('orders.csv', delimiter=',')
# get the data that has employeeID == 5
df1 = df[df["employeeID"] == 5]
print(df1)
# get the data that has employeeID == 5 and shipCountry=\"Brazil\"
df2 = df[(df["employeeID"] == 5)& (df["shipCountry"] == "Brazil")]
print(df2)
答案 5 :(得分:0)
正如上面提到的@ Nick T ,与RAM访问相比,I / O被认为是昂贵的,因此如果您需要多次迭代文件,最好将其保存到变量
您还可以在单个for循环中组合多个条件,因此它执行速度更快(单次迭代):
with open('orders.csv', 'rb') as ordersFile:
orders = list(csv.reader(ordersFile, delimiter=','))
# Find order employeeID=5, shipCountry="Brazil"
emp = []
country = []
for order in orders:
if order[2] == '5':
if order[13] == 'Brazil':
country.append(order)
else:
emp.append(order)
print 'emp id=5 and shippingcountry=Brazil: {}'.format(country)
print 'emp id=5: {}'.format(emp)
请注意,这不是可扩展的,您可能不希望在此块中添加任何更多if
逻辑,因为它变得不可读