列表,数据分析Python

时间:2018-01-02 03:46:20

标签: python python-2.7 list csv

我将csv转换为列表:

import csv
with open('DataAnalizada.csv', 'rb') as f:
    reader = csv.reader(f)
    a = list(reader)

我需要分析该列表中的信息,按客户群分析,日期为12/27的AAA客户,12月28日的AAA,2017年12月27日的BBB,BBB on 28/12/2017,CCC于12/27/2017,CCC于12/28/2017在这些组中,分析被考虑在内(稳定警报或增量,这是可以呈现的3个变量)这种情况如果对于AAA客户端在12/27/2017所有的Analysis值都是稳定的,我希望新的csv文件出现:AAA,12/27/2017,客户端的性能稳定,所以为每个客户和日期!

我需要一些有条件的功能,对于每个列表,客户端和日期相等,分析Analisis的列,并根据这个,如果它们都是Estable,AAA,12/27/2017,Estable:客户端&# 39;表现很出色,如果没有AAA,12/27/2017,No Analized

我对python很新,我不能真诚地做到这一点。我不知道如何通过嵌套列表并按照我之前的要求对其进行分组。对于问题中缺少代码而道歉

a = [['Cliente', 'Fecha', 'Variables', 'Dia Previo', 'Mayor/Menor', 'Dia a Analizar', 'Analisis'], 
['AAA', '27/12/2017', 'ECPM_medio', '0.41', 'Dentro del Margen', '0.35', 'Estable'], 
['AAA', '27/12/2017', 'Fill_rate', '2.25', 'Dentro del Margen', '2.7', 'Estable'], 
['AAA', '27/12/2017', 'Importe_a_pagar_a_medio', '62.4', 'Dentro del Margen', '61.21', 'Estable'], 
['AAA', '27/12/2017', 'Impresiones_exchange', '153927.0', 'Dentro del Margen', '173663.0', 'Estable'], 
['AAA', '27/12/2017', 'Subastas', '6827946.0', 'Dentro del Margen', '6431093.0', 'Estable'], 
['BBB', '27/12/2017', 'ECPM_medio', '1.06', 'Dentro del Margen', '1.06', 'Alerta'], 
['BBB', '27/12/2017', 'Fill_rate', '26.67', 'Dentro del Margen', '27.2', 'Alerta'], 
['BBB', '27/12/2017', 'Importe_a_pagar_a_medio', '11.34', 'Dentro del Margen', '12.77', 'Estable'], 
['BBB', '27/12/2017', 'Impresiones_exchange', '10648.0', 'Dentro del Margen', '12099.0', 'Estable'], 
['BBB', '27/12/2017', 'Subastas', '39930.0', 'Dentro del Margen', '44479.0', 'Estable'],
['AAA', '28/12/2017', 'ECPM_medio', '0.41', 'Dentro del Margen', '0.35', 'Estable'], 
['AAA', '28/12/2017', 'Fill_rate', '2.25', 'Dentro del Margen', '2.7', 'Estable'], 
['AAA', '28/12/2017', 'Importe_a_pagar_a_medio', '62.4', 'Dentro del Margen', '61.21', 'Estable'], 
['AAA', '28/12/2017', 'Impresiones_exchange', '153927.0', 'Dentro del Margen', '173663.0', 'Estable'], 
['AAA', '28/12/2017', 'Subastas', '6827946.0', 'Dentro del Margen', '6431093.0', 'Estable'], 
['BBB', '28/12/2017', 'ECPM_medio', '1.06', 'Dentro del Margen', '1.06', 'Estable'], 
['BBB', '28/12/2017', 'Fill_rate', '26.67', 'Dentro del Margen', '27.2', 'Estable'], 
['BBB', '28/12/2017', 'Importe_a_pagar_a_medio', '11.34', 'Dentro del Margen', '12.77', 'Estable'], 
['BBB', '28/12/2017', 'Impresiones_exchange', '10648.0', 'Dentro del Margen', '12099.0', 'Estable'], 
['BBB', '28/12/2017', 'Subastas', '39930.0', 'Dentro del Margen', '44479.0', 'Estable']]

我需要的新csv示例:

Cliente,Fecha,Analisis
AAA,27/12/2017,Stable: The client's performance was Stable
AAA,28/12/2017,Stable: The client's performance was Stable
BBB,27/12/2017,Stable: The client's performance was Stable
BBB,28/12/2017, Stable: The client's performance was Stable
CCC,27/12/2017,Stable: The client's performance was Stable
CCC,28/12/2017,Stable: The client's performance was Stable

3 个答案:

答案 0 :(得分:0)

我认为这可能会引导您达到您想要的效果,但不确定这会对您有所帮助。由于您没有条件来过滤数据,我尝试了以下方法来获得您想要的输出。请注意,这只是一个引导您走向熊猫的尝试。

pandas是最好的解决方法,因为您可以按照自己的方式操纵数据。阅读pandas中的csv。

我这样做是为了让你的数据进入pandas数据框,

import pandas as pd
headers = a.pop(0)
df = pd.DataFrame(a, columns = headers)
df

输出:

   Cliente  Fecha      Variables  Dia Previo    Mayor/Menor        Dia a Analizar   Analisis
0   AAA     27/12/2017  ECPM_medio  0.41        Dentro del Margen   0.35    Estable
1   AAA     27/12/2017  Fill_rate   2.25        Dentro del Margen   2.7     Estable
...

在此之后,我创建了一个具有状态的新列(仍然不知道确切的条件)

for i in df['Analisis']:
    if i == 'Estable' or i == 'Alerta':
        df['Status'] = 'Stable: The client''s performance was Stable'

现在,您可以在pandas中使用groupby功能来创建所需的输出。

df1= df.groupby(['Cliente','Fecha', 'Status']).size()
df1

输出,

Cliente  Fecha       Status                                    
AAA      27/12/2017  Stable: The clients performance was Stable    5
         28/12/2017  Stable: The clients performance was Stable    5
BBB      27/12/2017  Stable: The clients performance was Stable    5
         28/12/2017  Stable: The clients performance was Stable    5

当您使用groupby时,您必须使用聚合函数,我使用.size()

现在,您可以将此数据帧df1写入csv。您也可以将这些包装成一个函数。希望这会引导您为您的目的提供有效的分析方法。

答案 1 :(得分:0)

import csv 
from collections import namedtuple

with open('DataAnalizada.csv', 'rb') as f: 
    reader = csv.reader(f)
    first_col = reader.next()
    header = namedtuple('header', first_col)
    data = {}
    for val in reader:
         get_ = header(*val)
         if get_.Analisis == 'Estable':
             get_data = (get_.Cliente, get_.Fecha)
             if get_data in data:
                 get_list = data[get_data]
                 get_list.append(val)
             else:
                  data.setdefault(get_data, [])            
    with open('DataAnalizada_new.csv', 'wb+') as filename:
        header = ['Cliente','Fecha','Analisis']
        writer = csv.writer(filename)
        writer.writerow(header)
        for val in data.keys():
            writer.writerow(val + ["Stable: The client's performance was Stable"])

答案 2 :(得分:-1)

Pandas包装有您需要的工具。但是我建议从scipyanaconda开始,因为我发现自己安装Pandas非常困难。