Question

我必须遍历包含dicts列表的列的行（在现有数据帧中），然后从那里创建数据中的两个新数据帧。其中一个列表的一般形状如下所示：

[
 {"a": 10, "type": "square"}, {"type": "square", "b":11}, 
 {"type": "square", "c": 12}, {"d": 13, "type": "square"},
 {"type": "square", "e": 14}, {"a": 15, "type": "circle"}, 
 {"type": "circle", "b": 16}, {"type": "circle", "c": 17}, 
 {"d": 18, "type": "circle"}, {"type": "circle", "e": 19}
]

我有数千行这些人想要创建两个新的数据帧，一个用于圆圈，一个用于正方形，从而产生一个数据帧，其第一行看起来大致如下：

      type    a  b  c  d  e
0    square   10 11 12 13 14

到目前为止，我已经尝试将整个事情转换为json，它工作正常，但似乎改变了数据帧的性质，因此它不再可操作。 json还创建了一个包含多行（每个元素一行）的数据帧，而且我无法“压扁”＃34;将数据帧放到一个键上（在这种情况下，它将是＆＃34;键入＆＃34;）。

我还尝试了DataFrame.from_records，DataFrame.from_dict以及各种类似的使用大熊猫阅读数据的方法，但没有运气。

编辑：很抱歉不清楚，上面的字典示例存在于＆＃34;单元＆＃34;现有数据框架，我认为我正在寻找的第一步涉及从那个＆＃34;单元中提取它。＆＃34;到目前为止，我已经尝试了各种方法将对象转换为可用的东西（如上面的列表），但还没有成功。我需要创建变量以查找类似于此my_list = df.column[0]的变量，因此我可以遍历行。

Answer 1

让l成为您的词典列表

l = [
 {"a": 10, "type": "square"}, {"type": "square", "b":11}, 
 {"type": "square", "c": 12}, {"d": 13, "type": "square"},
 {"type": "square", "e": 14}, {"a": 15, "type": "circle"}, 
 {"type": "circle", "b": 16}, {"type": "circle", "c": 17}, 
 {"d": 18, "type": "circle"}, {"type": "circle", "e": 19}
]

然后让我们将一系列s定义为此列表的10行

s = pd.Series([l] * 10)
print(s)

0    [{'type': 'square', 'a': 10}, {'type': 'square...
1    [{'type': 'square', 'a': 10}, {'type': 'square...
2    [{'type': 'square', 'a': 10}, {'type': 'square...
3    [{'type': 'square', 'a': 10}, {'type': 'square...
4    [{'type': 'square', 'a': 10}, {'type': 'square...
5    [{'type': 'square', 'a': 10}, {'type': 'square...
6    [{'type': 'square', 'a': 10}, {'type': 'square...
7    [{'type': 'square', 'a': 10}, {'type': 'square...
8    [{'type': 'square', 'a': 10}, {'type': 'square...
9    [{'type': 'square', 'a': 10}, {'type': 'square...
dtype: object

现在我将定义一个使用字典理解的函数，将列表重新排列为更适合pd.Series的内容。实际上，字典的键将是tuple s，因此生成的系列的索引为pd.MultiIndex。这样可以在以后更容易分成两个独立的数据帧。

def proc(l):
    return pd.Series(
        {(li['type'], k): v for li in l for k, v in li.items() if k != 'type'})

现在我使用apply

df = s.apply(proc)
df

  circle                 square                
       a   b   c   d   e      a   b   c   d   e
0     15  16  17  18  19     10  11  12  13  14
1     15  16  17  18  19     10  11  12  13  14
2     15  16  17  18  19     10  11  12  13  14
3     15  16  17  18  19     10  11  12  13  14
4     15  16  17  18  19     10  11  12  13  14
5     15  16  17  18  19     10  11  12  13  14
6     15  16  17  18  19     10  11  12  13  14
7     15  16  17  18  19     10  11  12  13  14
8     15  16  17  18  19     10  11  12  13  14
9     15  16  17  18  19     10  11  12  13  14

我可以从这一点轻松地分配我的2个数据帧

circle = df.circle
square = df.square

替代方法
我们可以在s

上使用一组理解，而不是使用apply

df = pd.DataFrame(
    {k: {(li['type'], k): v
         for li in l
         for k, v in li.items() if k != 'type'}
     for k, l in s.iteritems()}
).T

<强> 定时
多理解方法似乎更快

Answer 2

这适用于您的示例：

pd.DataFrame(myList).groupby('type').agg(lambda x: x.dropna())

         a   b   c   d   e
type                      
circle  15  16  17  18  19
square  10  11  12  13  14

这个想法是读取dicts列表，将它们转换为单个DataFrame，每个dict有一行，按类型对它们进行分组，然后使用agg方法删除每个变量中的所有缺失值。

数据

myList = [ {"a": 10, "type": "square"}, {"type": "square", "b":11}, {"type": "square", "c": 12}, {"d": 13, "type": "square"}, {"type": "square", "e": 14}, {"a": 15, "type": "circle"}, {"type": "circle", "b": 16}, {"type": "circle", "c": 17}, {"d": 18, "type": "circle"}, {"type": "circle", "e": 19} ]

如果按以下列表重复类型，则上述答案会遇到问题：

myList2 = [ {"a": 10, "type": "square"}, {"type": "square", "b":11}, {"type": "square", "c": 12}, {"d": 13, "type": "square"}, {"type": "square", "e": 14}, {"a": 15, "type": "circle"}, {"type": "circle", "b": 16}, {"type": "circle", "c": 17}, {"d": 18, "type": "circle"}, {"type": "circle", "e": 19}, {"a": 11, "type": "square"}, {"type": "square", "b":12}, {"type": "square", "c": 13}, {"d": 14, "type": "square"}, {"type": "square", "e": 15}, {"a": 16, "type": "circle"}, {"type": "circle", "b": 17}, {"type": "circle", "c": 18}, {"d": 20, "type": "circle"}, {"type": "circle", "e": 20} ]

只要dicts列表是常规的，也就是说，每种类型都有5个相邻的dicts，那么你可以在groupby方法中添加列表理解，如下所示。

pd.DataFrame(myList2).groupby(['type', [math.floor(i / 5) for i, _ in enumerate(myList)]]) .agg(lambda x: x.dropna()) a b c d e type circle 1 15 16 17 18 19 3 16 17 18 20 20 square 0 10 11 12 13 14 2 11 12 13 14 15

从Dicts列表中的元素创建Pandas Dataframe

2 个答案: