我有一个字典列表,其中的键是:“国家”,“点”,“价格”。我有117000行。我需要按国家/地区对它们进行分组,并获得每个国家/地区的积分和价格总和。我的数据集中有44个国家。
我需要一本字典清单。
country_list = [{"Country": USA}, {"sum_points": 120}, {"sum_price": 200}], etc...
任何帮助将不胜感激。 Kinda坚持执行这项任务...
答案 0 :(得分:1)
我对熊猫图书馆不熟悉, 但是只有117000行数据,您绝对可以蛮力解决此问题。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import random
import collections
####### generating random inputs ###############
# for this part of my code I randomly generated 117000 rows of mock data using a
# list of 44 random countries and with prices and points between 1-1000.
# I stored it in the variable "random_input"
countries = ["anistan", "Albania", "Algeria", "Andorra", "Angola", "Antigua and Barbuda", "Argentina", "Armenia", "Australia", "Austria", "Azerbaijan", "Bahamas", "Bahrain", "Bangladesh", "Barbados", "Belarus", "Belgium", "Belize", "Benin", "Bhutan", "Bolivia", "Bosnia and Herzegovina", "Botswana", "Brazil", "Brunei", "Bulgaria", "Burkina Faso", "Burundi", "Côte d'Ivoire", "Cabo Verde", "Cambodia", "Cameroon", "Canada", "Central African Republic", "Chad", "Chile", "China", "Colombia", "Comoros", "Congo", "Costa Rica", "Croatia", "Cuba", "Cyprus", ]
random_input = []
for i in range(117000):
random_input.append({
"Country": random.choice(countries),
"points": random.randint(1,1000),
"price": random.randint(1,1000)
})
##################################################
# actual computing #
##################################################
#For this part, I created two counters and iterated through the input to count
# the number of price and points grouped by each country
sum_points = collections.Counter()
sum_prices = collections.Counter()
for row in random_input:
sum_points[row["Country"]] += row["points"]
sum_prices[row["Country"]] += row["price"]
# Finally format the output as a list of dictionaries
country_lst = []
for country in sum_points.keys():
country_lst.append({
"Country": country,
"sum_points": sum_points[country],
"sum_prices": sum_prices[country],
})
print(country_lst)
答案 1 :(得分:0)
您可以这样做:
df.groupby(['Country']).sum()