我正在尝试将这种数据结构整理为一个“普通”数据框。
原始数据是词典列表,其中包含列表。
数据如下(可重现的示例):
data = [{'A':[1,2,3,4], 'B':[11,12,13,14]}, {'A':[5,6,7,8], 'B':[15,16,17,18]}]
我期望的输出应为以下pandas数据框:
Out[01]:
A B
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 6 16
6 7 17
7 8 18
我如何获得此结果? 谢谢
答案 0 :(得分:2)
您可以使用pd.Series.explode
df = df[(df['price']!=0) | (~df['description'].isin(li))]
#Output
price description
1 a
2 a
0 c
1 e
// process 1 - calling api this running well can get the data see process 2
object Network {
var status : NewsApiStatus = NewsApiStatus.LOADING
private var viewModelJob = Job()
private val coroutineScope = CoroutineScope(viewModelJob + Dispatchers.Main)
fun getNews(filter: String, page: Int =1) : newsData? {
var allNews : newsData? = null
coroutineScope.launch {
RetrofitClient.instance.getAllNews(filter, page).enqueue(object: Callback<newsData>{
override fun onFailure(call: Call<newsData>, t: Throwable) {
status = NewsApiStatus.ERROR
}
override fun onResponse(
call: Call<newsData>,
response: Response<newsData>
) {
status = NewsApiStatus.DONE
var listResult = response.body()
if (listResult != null) {
if (listResult.data.isNotEmpty()) {
allNews = listResult
Timber.tag(TAG).i( "process 1 total allNews = ${allNews!!.data.size}")
}
}
}
})
}
return(allNews)
}
}
// process 2 - calling process 1 with runBlocking
fun refreshNews() = runBlocking{
val newsData = async {
Network.getNews("")
}
Timber.tag(TAG).i("proses 2 ${newsData.await()?.data?.size}")
// here I want newsData to wait until it has data
}
// this main program that call process 2
class NewsListViewModel(application: Application) : AndroidViewModel(application) {
init {
refreshNews()
}
}
答案 1 :(得分:1)
您可以简单地生成每个DataFrame并将它们串联起来:
import pandas as pd
data = [{'A':[1,2,3,4], 'B':[11,12,13,14]}, {'A':[5,6,7,8], 'B':[15,16,17,18]}]
df = pd.concat((pd.DataFrame(elm) for elm in data), ignore_index=True)
print(df)
A B
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 6 16
6 7 17
7 8 18
答案 2 :(得分:1)
尝试以下代码:
import pandas as pd
data = [{'A':[1,2,3,4], 'B':[11,12,13,14]}, {'A':[5,6,7,8], 'B':[15,16,17,18]}]
df = pd.DataFrame(data).apply(pd.Series.explode).reset_index(drop=True)
print(df)
答案 3 :(得分:1)
为提高性能,请结合使用collections.defaultdict
和extend
:
from collections import defaultdict
d = defaultdict(list)
for x in data:
for k, v in x.items():
d[k].extend(v)
df = pd.DataFrame(d)
print (df)
A B
0 1 11
1 2 12
2 3 13
3 4 14
4 5 15
5 6 16
6 7 17
7 8 18