如何从包含列表的字典列表中获取平坦的数据框?

时间:2020-06-03 08:01:14

标签: python pandas dictionary flatten

我正在尝试将这种数据结构整理为一个“普通”数据框。

原始数据是词典列表,其中包含列表。

数据如下(可重现的示例):

data = [{'A':[1,2,3,4], 'B':[11,12,13,14]}, {'A':[5,6,7,8], 'B':[15,16,17,18]}]

我期望的输出应为以下pandas数据框:

Out[01]: 
   A   B
0  1  11
1  2  12
2  3  13
3  4  14
4  5  15
5  6  16
6  7  17
7  8  18

我如何获得此结果? 谢谢

4 个答案:

答案 0 :(得分:2)

您可以使用pd.Series.explode

df = df[(df['price']!=0) | (~df['description'].isin(li))]

#Output
price description
     1           a
     2           a
     0           c
     1           e

或使用collections.defaultdict

// process 1 - calling api this running well can get the data see process 2

object Network {
    var status : NewsApiStatus =  NewsApiStatus.LOADING

    private var viewModelJob = Job()
    private val coroutineScope = CoroutineScope(viewModelJob + Dispatchers.Main)

    fun getNews(filter: String, page: Int =1) : newsData? {
        var allNews : newsData? = null
        coroutineScope.launch {
            RetrofitClient.instance.getAllNews(filter, page).enqueue(object: Callback<newsData>{
                override fun onFailure(call: Call<newsData>, t: Throwable) {
                    status = NewsApiStatus.ERROR
                }

                override fun onResponse(
                    call: Call<newsData>,
                    response: Response<newsData>
                ) {
                    status = NewsApiStatus.DONE
                    var listResult = response.body()
                    if (listResult != null) {
                        if (listResult.data.isNotEmpty())  {
                            allNews = listResult
                            Timber.tag(TAG).i( "process 1 total allNews = ${allNews!!.data.size}")
                        }
                    }
                }
            })
        }
        return(allNews)
    }
}

// process 2 - calling process 1 with runBlocking

fun refreshNews() = runBlocking{
    val newsData = async {
        Network.getNews("")
    }
    Timber.tag(TAG).i("proses 2 ${newsData.await()?.data?.size}")
    // here I want newsData to wait until it has data
}

// this main program that call process 2

class NewsListViewModel(application: Application) : AndroidViewModel(application) {
    init {
        refreshNews()
    }
}

答案 1 :(得分:1)

您可以简单地生成每个DataFrame并将它们串联起来:

import pandas as pd

data = [{'A':[1,2,3,4], 'B':[11,12,13,14]}, {'A':[5,6,7,8], 'B':[15,16,17,18]}]

df = pd.concat((pd.DataFrame(elm) for elm in data), ignore_index=True)
print(df)


   A   B
0  1  11
1  2  12
2  3  13
3  4  14
4  5  15
5  6  16
6  7  17
7  8  18

答案 2 :(得分:1)

尝试以下代码:


import pandas as pd
data = [{'A':[1,2,3,4], 'B':[11,12,13,14]}, {'A':[5,6,7,8], 'B':[15,16,17,18]}]

df = pd.DataFrame(data).apply(pd.Series.explode).reset_index(drop=True)

print(df)

答案 3 :(得分:1)

为提高性能,请结合使用collections.defaultdictextend

from collections import defaultdict

d = defaultdict(list)
for x in data:
    for k, v in x.items():
        d[k].extend(v)
df = pd.DataFrame(d)
print (df)
   A   B
0  1  11
1  2  12
2  3  13
3  4  14
4  5  15
5  6  16
6  7  17
7  8  18