Question

我一直在尝试在包含三列的长csv文件中执行以下操作：

对于每一行，获取前250行的条目的最大值和最小值。数据是这样的 - 第1列是索引（1-5300），第2列是数据的位置，第3列是另一个，但此处未使用。这是我现在的代码。注意，'i'是查看列1的行索引。列2是存储数据的位置（即我想要的最大和最小的数据）。

我遇到的问题是csv.reader始终在文件末尾开始，并将整个算法抛出窗口。不知道我做错了什么。请帮忙

max1 = 0
min1 = 1000000    

i = 3476
f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)
for col in file_reader:
    serial          = int(col[0])
    if serial <i-250:
        spyy = float(col[1])
        print spyy

    for j in range(0,250):
        spyy = float(col[1])          
        max1 = max(max1,spyy)
        min1 = min(min1,spyy)
        file_reader.next()
        #print spyy

f1.close()

print 'max =' +str(max1) + 'min = ' + str(min1)

Answer 1

在您的代码中，此行

for col in file_reader:

实际上正在遍历文件的行或行，而不是列

并且对于每个col，您稍后会在此代码中推进阅读器250行

for j in range(0,250):
    spyy = float(col[1]) # here you're grabbing the same second item 250 times
    max1 = max(max1,spyy) # setting the new max to the same value 250 times
    min1 = min(min1,spyy) # setting the new min to the same value 250 times
    file_reader.next() # now you advance, but col is the same so ...
    # it's like you're skipping 250 lines

这意味着col中存储的每一行实际上是在col中存储的上一行之后的250行。这就像你以250步为单位跳过文件。

我根据你说你想做的事情重写了它。看看这是否更有意义：

f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)

spyy_values = []
mins = []
maxes = []

# just saying 'for x in file_reader' is all you need to iterate through the rows
# you don't need to use file_reader.next()
# here I'm also using the enumerate() function
# which automatically returns an index for each row
for row_index, row in enumerate(file_reader):
    # get the value
    spyy_values.append( float(row[1]) )

    if row_index >= 249:
        # get the min of the last 250 values,
        # including this line
        this_min = min(spyy_values[-250:])
        mins.append(this_min)
        # get the max of the last 250 values,
        # including this line
        this_max = max(spyy_values[-250:])
        maxes.append(this_max)

print "total max:", max(maxes)
print "total min:", min(mins)
print "you have %s max values" % len(maxes)
print "you have %s min values" % len(mins)
print "here are the maxes", maxes
print "here are the mins", mins

请记住，csv.reader是迭代器，因此for循环将自动前进到每一行。查看the example in the documentation。

Answer 2

好像你在错误的地方做了file_reader.next（）。根据您发布的代码，file_reader.next（）将在内部FOR循环中执行，这可能是它在处理第一列本身后最终在EOF处理的原因。

正确的代码是：

max1 = 0
min1 = 1000000    

i = 3476
f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)
for col in file_reader:
    serial          = int(col[0])
    if serial <i-250:
        spyy = float(col[1])
        print spyy

    for j in range(0,250):
        spyy = float(col[1])          
        max1 = max(max1,spyy)
        min1 = min(min1,spyy)
# you move to the next row after processing the current row
file_reader.next()
 #print spyy

f1.close()

print 'max =' +str(max1) + 'min = ' + str(min1)

让我知道这是否有效

Answer 3

由于您的前两列是数字，这可能会对您有所帮助。您可以自己阅读并用“，”分隔。（只是一种解决方法）。

使用

file_reader=  open('PUT/PUT_SELLING.csv').readlines()
for line in file_reader:
    col = line.split(",")
    serial          = int(col[0])

取代

f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)
for col in file_reader:
   serial          = int(col[0])

Answer 4

f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)
which_str = raw_input('Comma seperated list of indices to show: ')
which_to_show = [int(i) for i in which_str.split(',')]
vals = []
for cols in file_reader:  # This will iteratate the rows
    vals.append(float(col[1]))  # Accumulate the results
    index = int(cols[0])
    if index > 249:      # enough to show min,max
        mini = (min(vals))  # add to vals
        maxi = (max(vals))
        del vals[0]  # remove the first entry
    if index in which_to_show:
         print 'index %d min=%f max=%f' % (index, mini, maxi)  # Format vals

f1.close()

在python中操作csv文件

4 个答案: