从列创建列表,保留重复项

时间:2019-11-05 12:03:16

标签: python pandas

我正在尝试获取包含订单中每个项目的列表。我的数据是每行一个订单的格式,可能的项目为列,而每个项目的编号为值。

我已经想出了一种方法来处理独特的物品,但是如果重复的物品被多次包含,我真的很喜欢它。这是一个示例:

import pandas as pd 

# Example dataframe
data = {'Egg':[0, 2, 1], 'Toast':[2, 2, 1]} 
breakfast = pd.DataFrame(data) 

# Cycle through columns and replace numbers with food words
value_cols = list(breakfast)

for food in value_cols:
    breakfast.loc[breakfast[food] != 0, food] = food

# Create a list of foods
list_of_foods = breakfast.values.tolist()

# Remove empty values
list_of_foods = [[x for x in y if x != 0] for y in list_of_foods]

这给出了这样的列表列表:

[['Toast'], ['Egg', 'Toast'], ['Egg', 'Toast']]

但是,我真的想要一个这样的列表列表:

[['Toast', 'Toast'], ['Egg', 'Egg', 'Toast', 'Toast'], ['Egg', 'Toast']]

我真的不知道如何实现这一目标。我想知道重复行中是否有重复项,但是我也会以我认为的相同顺序重复非重复项。有人有什么想法吗?

3 个答案:

答案 0 :(得分:2)

想法按每一行循环,按列名压缩,并使用平坦的嵌套列表重复值:

list_of_foods = [[c for a, b in zip(v, breakfast.columns) for c in [b] * a]
                  for v in breakfast.values]

print (list_of_foods)
[['Toast', 'Toast'], ['Egg', 'Egg', 'Toast', 'Toast'], ['Egg', 'Toast']]

答案 1 :(得分:1)

它当然不是很漂亮,但是我认为它可以工作:

require 'erb'

greetings = ['Hello World', 'Hello Earth', 'Hello Mars']

body = ERB.new(
  <<-html
  <html>
    <body>
      <ul>
        <% greetings.each do |greeting| %>
          <li><%= greeting %></li>
        <% end %>
      </ul>
    </body>
  </html>
  html
).result(binding)

puts body
#  <html>
#    <body>
#      <ul>
#        
#          <li>Hello World</li>
#        
#          <li>Hello Earth</li>
#        
#          <li>Hello Mars</li>
#        
#      </ul>
#    </body>
#  </html>

这给了我: data = {'Egg':[0, 2, 1], 'Toast':[2, 2, 1]} # keys are dishes, values are frequencies out = [] for i in range(len(list(data.values())[0])): # iterate over number of orders (num of frequencies) out.append([]) # new list for each order for key in data.keys(): # iterate overy dishes out[i].extend([key for i in range(data[key][i]) ]) # replicate dish a given amount of frequencies

将其封装到函数中,然后就可以了

答案 2 :(得分:1)

使用Series.repeat

代码

breakfast.apply(lambda x: list(x.index.repeat(x)), axis=1).tolist()

输出

[['Toast', 'Toast'], ['Egg', 'Egg', 'Toast', 'Toast'], ['Egg', 'Toast']]