我已经尝试了pandas.melt和.stack和.pivot的每种组合,但是都没有取得任何进展。
我有一个常规格式的excel表:
^(\.\.\/(?:\.\.\/)*)?(?!.*?\/\/)(?!(?:.*\/)?\.+(?:\/|$)).+$
我正在尝试处理此数据,因此我可以创建以下条形图:
代码:
1997 1998 1999 2000
Total, N (%) 3350 (34) 3387 (33) 4778 (33) 3588 (33)
Age category N, (%)
A 231 (24) 227 (24) 222 (23) 211 (22)
B 492 (24) 481 (24) 487 (24) 405 (24)
C 759 (28) 759 (27) 746 (26) 746 (26)
D 1901 (45) 1873 (44) 1233 (44) 1903 (44)
Sex, N (%)
F 1650 (33) 1493 (33) 1673 (33) 1628 (32)
M 1734 (35) 1794 (34) 1705 (34) 1760 (34)
Diet
Vegan 1553 (32) 1442 (31) 1453 (31) 1422 (31)
Carnivore 1857 (36) 1063 (36) 1225 (35) 1926 (34)
Favorite movie
horror 1036 (24) 1033 (24) 1458 (24) 1742 (24)
romance 732 (41) 743 (40) 735 (40) 799 (38)
comedy 514 (34) 498 (32) 518 (32) 496 (32)
silent 1110 (47) 1933 (47) 1967 (46) 1751 (46)
* Percents are in relation to 100% of children who filled out survey
我有2个问题 我希望透视表,以便“年龄”,“性别”,“饮食”和“喜欢的电影”行是多索引列,每个类别下都有类别,而年份则作为行(观察) 因此最终产品看起来像:
import pandas as pd
categories = ['Age category N, (%)', 'Sex, N (%)', 'Diet', 'Favorite movie']
subcategories = ['A','B','C','D','F','M',"Vegan","Carnivore",'horror','romance','comedy','silent']
df = pd.DataFrame(
{'1997': [33850 (34), NaN ,231 (24), 492 (24), 759 (28), 1901 (45), NaN , 1650 (33), 1734 (35),NaN ,
1553 (32), 1857 (36),NaN , 1036 (24), 732 (41), 514 (34), 1110 (47)],
'1998': [33687 (33),NaN ,227 (24), 481 (24), 759 (27), 1873 (44),NaN ,1493 (33), 1794 (34), NaN ,1442 (31), 1063 (36),NaN , 1033 (24), 743 (40), 498 (32), 1933 (47)],
'1999': [3778 (33), NaN ,222 (23), 487 (24), 746 (26), 1233 (44),NaN , 1673 (33), 1705 (34),NaN , 1453 (31), 1225 (35),NaN , 1458 (24), 735 (40), 518 (32), 1967 (46)],
'2000' : [3588 (33),NaN ,211 (22), 405 (24), 746 (26), 1903 (44), NaN , 1628 (32), 1760 (34),NaN , 1422 (31), 1926 (34),NaN , 1742 (24), 799 (38), 496 (32), 1751 (46)]},
index = pd.MultiIndex.from_tuples(
[('Age category N, (%)','A'),('Age category N, (%)','B'),('Age category N, (%)','C'),
('Age category N, (%)', 'D'), ('Sex, N (%)', 'F'), ('Sex, N (%)', 'M'),
('Diet', 'Vegan'), ('Diet', 'Carnivore'),
('Favorite Movie','horror'),('Favorite Movie','romance'),('Favorite Movie','comedy'),('Favorite Movie','silent')],
names = [categories, subcategories]))
问题的关键是我必须将'(%)'与计数分开,同时使其与数字“关联”(以告知正确的y轴)
任何指导都值得赞赏!