迭代CSV删除分析数据

时间:2016-09-29 02:46:30

标签: python list csv append

您好我正在尝试获取CSV文件并迭代每个客户数据。为了解释,每个客户都有12个月的数据。我想分析他们的年度数据,将这些数据的相关性保存到一个新列表中并循环,直到所有客户都被分析。

例如,这里是客户数据的样子(简化案例): enter image description here

我已经能够将其用于生成一个客户数据的CSV中的相关性。但是,我的数据表中有成千上万的客户。我想使用嵌套的for循环将每个客户的所有相关值放入列表/数组中。该列表将有一行特定客户的相关性,然后下一行将成为下一个客户。

这是我目前的代码:

import numpy
from numpy import genfromtxt
overalldata = genfromtxt('C:\Users\User V\Desktop\CUSTDATA.csv', delimiter=',')
emptylist = []
overalldatasubtract = overalldata[13::]
#This is where I try to use the four loop to go through all the customers. I     don't know if len will give me all the rows or the number of columns.
for x in range(0,len(overalldata),11):
    for x in range(0,13,1):
            cust_months = overalldata[0:x,1]
            cust_balancenormal = overalldata[0:x,16]
            cust_demo_one = overalldata[0:x,2]
            cust_demo_two = overalldata[0:x,3]
            num_acct_A = overalldata[0:x,4]
            num_acct_B = overalldata[0:x,5]
    #Correlation Calculations
            demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0]
            demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0]
            demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0]
            demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0]
            demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0]
            demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0]

            result_correlation = [demo_one_corr_balance, demo_two_corr_balance, demo_one_corr_acct_a, demo_one_corr_acct_b, demo_two_corr_acct_a, demo_two_corr_acct_b]

result_correlation_combined = emptylist.append(result_correlation)
#This is where I try to delete the rows I have already analyzed.
overalldata = overalldata[11**x::]

print result_correlation_combined
print overalldatasubtract

似乎我的减法方法正常工作,但当我用更大的数据集尝试它时,我意识到我的方法是完全错误的。

你会以不同的方式做到这一点吗?我认为它可以工作,但我找不到我的错误。

2 个答案:

答案 0 :(得分:0)

对两个循环使用相同的变量x。在第二个循环中x从0到12,无论客户是什么,并且由于您仅使用x设置了行号,因此您仍然困在第一个客户身上。

你的双循环应该是这样的:

# loop over the customers
for x_customer in range(0,len(overalldata),12):
    # loop over the months
    for x_month in range(0,12,1):
        # line number: x
        x = x_customer*12 + x_month
        ...

我改变了循环的界限和步骤,因为:

  • 循环1:有12个月,每位客户12行 - > step = 12
  • 循环2:有12个月,因此月号范围从0到11 - > range(0,12,1)

答案 1 :(得分:0)

这就是我解决问题的方法:这是我的for循环放置的一个问题。一个简单的缩进问题。感谢您对上述海报的帮助。

对于范围内的x_customer(0,len(overalldata),12):

    for x in range(0,13,1):
            cust_months = overalldata[0:x,1]
            cust_balancenormal = overalldata[0:x,16]
            cust_demo_one = overalldata[0:x,2]
            cust_demo_two = overalldata[0:x,3]
            num_acct_A = overalldata[0:x,4]
            num_acct_B = overalldata[0:x,5]
#Correlation Calculations
            demo_one_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_one)[1,0]
            demo_two_corr_balance = numpy.corrcoef(cust_balancenormal, cust_demo_two)[1,0]
            demo_one_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_one)[1,0]
            demo_one_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_one)[1,0]
            demo_two_corr_acct_a = numpy.corrcoef(num_acct_A, cust_demo_two)[1,0]
            demo_two_corr_acct_b = numpy.corrcoef(num_acct_B, cust_demo_two)[1,0]

            result_correlation = [(demo_one_corr_balance),(demo_two_corr_balance),(demo_one_corr_acct_a),(demo_one_corr_acct_b),(demo_two_corr_acct_a),(demo_two_corr_acct_b)]
            numpy.savetxt('correlationoutput.csv', (result_correlation))
    result_correlation_combined = emptylist.append([result_correlation])
    cust_delete_list = [0,(x_customer),1]
    overalldata = numpy.delete(overalldata, (cust_delete_list), axis=0)