如何在python中合并数据

时间:2016-01-15 08:32:36

标签: python merge

我只是学习python不久。我尽力表示我的数据看起来更好,就像以前一样。 现在我有一些元组数据类型,如下所示:

2016-01-16 02:34:28 Connection: opening to smtp.gmail.com:587, timeout=300, options=array (
                                  )
2016-01-16 02:34:28 Connection: opened
2016-01-16 02:34:29 SERVER -> CLIENT: 220 smtp.gmail.com ESMTP ry1sm18220246pab.30 - gsmtp
2016-01-16 02:34:29 CLIENT -> SERVER: EHLO localhost
2016-01-16 02:34:29 SERVER -> CLIENT: 250-smtp.gmail.com at your service, [73.15.255.61]
                                  250-SIZE 35882577
                                  250-8BITMIME
                                  250-STARTTLS
                                  250-ENHANCEDSTATUSCODES
                                  250-PIPELINING
                                  250-CHUNKING
                                  250 SMTPUTF8
2016-01-16 02:34:29 CLIENT -> SERVER: STARTTLS
2016-01-16 02:34:29 SERVER -> CLIENT: 220 2.0.0 Ready to start TLS
2016-01-16 02:34:29 CLIENT -> SERVER: EHLO localhost
2016-01-16 02:34:29 SERVER -> CLIENT: 250-smtp.gmail.com at your service, [73.15.255.61]
                                  250-SIZE 35882577
                                  250-8BITMIME
                                  250-AUTH LOGIN PLAIN XOAUTH2 PLAIN-CLIENTTOKEN OAUTHBEARER XOAUTH
                                  250-ENHANCEDSTATUSCODES
                                  250-PIPELINING
                                  250-CHUNKING
                                  250 SMTPUTF8
2016-01-16 02:34:29 CLIENT -> SERVER: AUTH LOGIN
2016-01-16 02:34:29 SERVER -> CLIENT: 334 VXNlcm5hbWU6
2016-01-16 02:34:29 CLIENT -> SERVER: dmliaHUxMjAxQGdtYWlsLmNvbQ==
2016-01-16 02:34:29 SERVER -> CLIENT: 334 UGFzc3dvcmQ6
2016-01-16 02:34:29 CLIENT -> SERVER: Q0BycGVEMWVt
2016-01-16 02:34:29 SERVER -> CLIENT: 235 2.7.0 Accepted
2016-01-16 02:34:29 CLIENT -> SERVER: MAIL FROM:<v@v>
2016-01-16 02:34:29 SERVER -> CLIENT: 250 2.1.0 OK ry1sm18220246pab.30 - gsmtp
2016-01-16 02:34:29 CLIENT -> SERVER: RCPT TO:<****@gmail.com>
2016-01-16 02:34:29 SERVER -> CLIENT: 250 2.1.5 OK ry1sm18220246pab.30 - gsmtp
2016-01-16 02:34:29 CLIENT -> SERVER: DATA
2016-01-16 02:34:29 SERVER -> CLIENT: 354  Go ahead ry1sm18220246pab.30 - gsmtp
2016-01-16 02:34:29 CLIENT -> SERVER: Date: Sat, 16 Jan 2016 02:34:28 +0000
2016-01-16 02:34:29 CLIENT -> SERVER: To: **** **** <****@gmail.com>
2016-01-16 02:34:29 CLIENT -> SERVER: From: v <v@v>
2016-01-16 02:34:29 CLIENT -> SERVER: Subject: Message Sent from jcrageralternatives.com by: v
2016-01-16 02:34:29 CLIENT -> SERVER: Message-ID: <421aa50e45d9e33b9b7c41918d99af59@localhost>
2016-01-16 02:34:29 CLIENT -> SERVER: X-Mailer: PHPMailer 5.2.14 (https://github.com/PHPMailer/PHPMailer)
2016-01-16 02:34:29 CLIENT -> SERVER: MIME-Version: 1.0
2016-01-16 02:34:29 CLIENT -> SERVER: Content-Type: multipart/alternative;
2016-01-16 02:34:29 CLIENT -> SERVER:   boundary="b1_421aa50e45d9e33b9b7c41918d99af59"
2016-01-16 02:34:29 CLIENT -> SERVER: Content-Transfer-Encoding: 8bit
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER: This is a multi-part message in MIME format.
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER: --b1_421aa50e45d9e33b9b7c41918d99af59
2016-01-16 02:34:29 CLIENT -> SERVER: Content-Type: text/plain; charset=us-ascii
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER: as;lkdfjas;ldkf
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER: --b1_421aa50e45d9e33b9b7c41918d99af59
2016-01-16 02:34:29 CLIENT -> SERVER: Content-Type: text/html; charset=us-ascii
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER: <p>Name: v</p><p>Email Provided: v@v</p><p>Phone Number Provided: 1234567891</p><p>Message: 'as;lkdfjas;ldkf'</p>
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER: --b1_421aa50e45d9e33b9b7c41918d99af59--
2016-01-16 02:34:29 CLIENT -> SERVER:
2016-01-16 02:34:29 CLIENT -> SERVER: .
2016-01-16 02:34:30 SERVER -> CLIENT: 250 2.0.0 OK 1452911670 ry1sm18220246pab.30 - gsmtp
2016-01-16 02:34:30 CLIENT -> SERVER: QUIT
2016-01-16 02:34:30 SERVER -> CLIENT: 221 2.0.0 closing connection ry1sm18220246pab.30 - gsmtp
2016-01-16 02:34:30 Connection: closed
{"success":true}

我想知道每个人买了多少件物品。

假设不同的名字是不同的人。

那么我怎么做才能获得如下信息:

('John', '5', 'Coke')
('Mary', '1', 'Pie')
('Jack', '3', 'Milk')
('Mary', '2', 'Water') 
('John', '3', 'Coke')

我不知道我现在该怎么办。即使是愚蠢的人也无法想出任何方法。

3 个答案:

答案 0 :(得分:8)

我建议使用名字和饮料作为collections.Counter的关键:

from collections import Counter
count = Counter()
for name, amount, drink in tuples:
    key = name, drink
    count.update({key: int(amount)})  # increment the value

# represent the aggregated data
for (name, drink), amount in count.items():
    print('{}: {} {}'.format(name, amount, drink))

更新我做了一些简单的测量,并想出了

count[name, drink] += value

不仅更具可读性,而且比调用update快得多,这不应该是一个惊喜。此外,defaultdict(int)甚至更快(大约两倍)(大概是因为Counter还执行了一些排序。)

答案 1 :(得分:2)

重新安排数据顺序可能有所帮助:

John: 8 Coke 
Mary: 1 Pie 
Mary: 2 Water 
Jack: 3 Milk
当写为

时,

可能更具洞察力

(John, Coke) : 8
(Mary, Pie)  : 1
(Mary, Water): 2
(Jack, Milk) : 3

如果你知道SQL,这或多或少等同于groupby(name, dish)sum(count)

因此,在Python中,您可以为该对创建字典:

data = [
  ('John', '5', 'Coke'),
  ('Mary', '1', 'Pie'),
  ('Jack', '3', 'Milk'),
  ('Mary', '2', 'Water'), 
  ('John', '3', 'Coke'),
]

orders = {}
for name, count, dish in data:
    if (name, dish) in orders:
        orders[(name, dish)] += int(count)
    else:
        # first entry
        orders[(name, dish)] = int(count)

更加pythonic,使用collections.defaultdict

orders = defaultdict(int)
for name, count, dish in data:
    orders[(name, dish)] += int(count)
@bereal指出

collections.Counter

根据需要格式化数据。

答案 2 :(得分:1)

假设你有一个元组列表

tuples = [('John', '5', 'Coke'),
('Mary', '1', 'Pie'),
('Jack', '3', 'Milk'),
('Mary', '2', 'Water'), 
('John', '3', 'Coke')]

memory = {}

# First, we calculate the amount for each pair
for tuple in tuples:

    # I define a generated key through the names. For example John-Cake, Mary-Pie, Jack-Milk,...
    key = (tuple[0],tuple[2])

    number = int(tuple[1])
    if key in memory:
        memory[key] += number
    else:
        memory[key] = number

# After, we format the information
list = []
for key in memory:
    list.append((key[0],memory[key],key[1]))