在Python中创建一个包含三个列表中所有值组合的数据框

时间:2017-09-28 15:41:20

标签: python pandas dataframe

所以我有两个列表:gender = ['Male', 'Female']subject = ['Math3_Exam_Mark', 'Math6_Exam_Mark', 'Math9_Exam_Mark', 'ELA3_Exam_Mark', 'ELA6_Exam_Mark', 'ELA9_Exam_Mark'],以及一个包含从CSV文件中提取的日期列表的ndarray birthMonthYear

我想创建一个包含三列的新数据框:gender,subject,birthMonthYear。每个性别,主题和出生的组合都应该有行.MonthYear。

有没有一种简单的方法可以做到这一点,也许是大熊猫?我想我可以创建遍历每个列表的嵌套foreach循环来创建数据框,但是如果有更简单的东西我想尝试它。

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

设置

gender = ['Male', 'Female']
subject = ['Math3_Exam_Mark', 'Math6_Exam_Mark', 'Math9_Exam_Mark',
           'ELA3_Exam_Mark', 'ELA6_Exam_Mark', 'ELA9_Exam_Mark']
birthMonthYear = pd.date_range('2010-01-31', periods=2, freq='M')

选项1
itertools.product

from itertools import product

pd.DataFrame(
    list(product(gender, subject, birthMonthYear)),
    columns=['Gender', 'Subject', 'BirthMonthYear']
)

    Gender          Subject BirthMonthYear
0     Male  Math3_Exam_Mark     2010-01-31
1     Male  Math3_Exam_Mark     2010-02-28
2     Male  Math6_Exam_Mark     2010-01-31
3     Male  Math6_Exam_Mark     2010-02-28
4     Male  Math9_Exam_Mark     2010-01-31
5     Male  Math9_Exam_Mark     2010-02-28
6     Male   ELA3_Exam_Mark     2010-01-31
7     Male   ELA3_Exam_Mark     2010-02-28
8     Male   ELA6_Exam_Mark     2010-01-31
9     Male   ELA6_Exam_Mark     2010-02-28
10    Male   ELA9_Exam_Mark     2010-01-31
11    Male   ELA9_Exam_Mark     2010-02-28
12  Female  Math3_Exam_Mark     2010-01-31
13  Female  Math3_Exam_Mark     2010-02-28
14  Female  Math6_Exam_Mark     2010-01-31
15  Female  Math6_Exam_Mark     2010-02-28
16  Female  Math9_Exam_Mark     2010-01-31
17  Female  Math9_Exam_Mark     2010-02-28
18  Female   ELA3_Exam_Mark     2010-01-31
19  Female   ELA3_Exam_Mark     2010-02-28
20  Female   ELA6_Exam_Mark     2010-01-31
21  Female   ELA6_Exam_Mark     2010-02-28
22  Female   ELA9_Exam_Mark     2010-01-31
23  Female   ELA9_Exam_Mark     2010-02-28

选项2
pd.MultiIndex.from_product

idx = pd.MultiIndex.from_product(
    [gender, subject, birthMonthYear],
    names=['Gender', 'Subject', 'BirthMonthYear']
)

pd.DataFrame(index=idx).reset_index()

    Gender          Subject BirthMonthYear
0     Male  Math3_Exam_Mark     2010-01-31
1     Male  Math3_Exam_Mark     2010-02-28
2     Male  Math6_Exam_Mark     2010-01-31
3     Male  Math6_Exam_Mark     2010-02-28
4     Male  Math9_Exam_Mark     2010-01-31
5     Male  Math9_Exam_Mark     2010-02-28
6     Male   ELA3_Exam_Mark     2010-01-31
7     Male   ELA3_Exam_Mark     2010-02-28
8     Male   ELA6_Exam_Mark     2010-01-31
9     Male   ELA6_Exam_Mark     2010-02-28
10    Male   ELA9_Exam_Mark     2010-01-31
11    Male   ELA9_Exam_Mark     2010-02-28
12  Female  Math3_Exam_Mark     2010-01-31
13  Female  Math3_Exam_Mark     2010-02-28
14  Female  Math6_Exam_Mark     2010-01-31
15  Female  Math6_Exam_Mark     2010-02-28
16  Female  Math9_Exam_Mark     2010-01-31
17  Female  Math9_Exam_Mark     2010-02-28
18  Female   ELA3_Exam_Mark     2010-01-31
19  Female   ELA3_Exam_Mark     2010-02-28
20  Female   ELA6_Exam_Mark     2010-01-31
21  Female   ELA6_Exam_Mark     2010-02-28
22  Female   ELA9_Exam_Mark     2010-01-31
23  Female   ELA9_Exam_Mark     2010-02-28