我一直在尝试从此csv文件中提取数据并以某种方式组织它,以便可以更清楚地查看数据。目的是创建2个词典。一种用于保存来自csv中列出的区域的数据。另一个保存来自csv国家的数据。我在遍历数据时遇到麻烦。csv文件开始首先列出所有区域。直到“ ID”列达到第4位时,这些国家才开始成立,因此我需要组织这些帮助。到目前为止,我有这个。但是我仍然需要帮助根据地区和国家进行组织。 链接到csv文件是: https://docs.google.com/document/d/1v68_QQX7Tn96l-b0LMO9YZ4ZAn_KWDMUJboa6LEyPr8/edit?usp=sharing
import csv
f = open('dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv')
reader = csv.DictReader(f)
data_by_region = {}
data_by_country = {}
answers = []
for line in reader:
#Collects all the region names
regions = line['Region/Country/Area']
# Gets All the Years
years = line['Year']
# print(regions)
if regions not in data_by_region:
data_by_region[regions] = {}
答案 0 :(得分:1)
也许这会有所帮助:
import csv
f = open('dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv', encoding='utf-8-sig')
reader = csv.DictReader(f)
data_by_region = {}
data_by_country = {}
answers = []
for line in reader:
# Collects all the region names
regions = line['Region/Country/Area']
# Gets All the Years
years = line['Year']
# print(regions)
if regions not in data_by_region:
data_by_region[regions] = [line]
else:
data_by_region[regions].append(line)
# print data count group by regions.
for region, data_list in data_by_region.items():
print('{:>30s}: {} rows.'.format(region, len(data_list)))
输出:
Total, all countries or areas: 21 rows.
Africa: 18 rows.
Northern Africa: 21 rows.
Sub-Saharan Africa: 21 rows.
Eastern Africa: 18 rows.
Middle Africa: 18 rows.
Southern Africa: 18 rows.
Western Africa: 18 rows.
Northern America: 18 rows.
...
答案 1 :(得分:0)
python具有内置功能groupby
,可帮助您对数据进行分组,但是它需要您使用组密钥对列表进行排序,因此,如果要按Region/Country/Area
进行分组,则需要对其进行排序第一。以下代码段应有助于快速将数据分组。
import csv
import itertools
def csv_iter(filepath):
with open(filepath, mode="r", encoding="utf-8-sig") as f:
reader = csv.DictReader(f)
yield from reader
def main():
filepath = "dph_SYB60_T03_Population Growth, Fertility and Mortality Indicators.csv"
data_list = sorted(csv_iter(filepath), key=lambda x: x.get("Region/Country/Area", ""))
for g, v in itertools.groupby(data_list, key=lambda x: x.get("Region/Country/Area", "")):
print("{}: {}".format(g, len(list(v))))
if __name__ == "__main__":
main()
输出:
Afghanistan: 21
Africa: 18
Albania: 21
Algeria: 21
American Samoa: 9
Andorra: 6
Angola: 21
Anguilla: 9
Antigua and Barbuda: 18
Argentina: 21
......
但是有一个与您对数据的理解有关的问题,并非所有区域ID都小于4
,例如,Northern Africa
的ID为15
,因此您不能通过ID区分不同的地区和国家/地区,您需要找到所有名称并构建地区和国家/地区列表,然后才能区分一行是属于地区还是国家/地区。