熊猫产生具有值范围的数据框

时间:2019-01-22 11:21:41

标签: python pandas

我有一个数据框:

DROP TRIGGER RG_SQLLighthouse_DDLTrigger ON ALL SERVER;

我想复制上面的数据框,取值范围,年份和月份。

例如:

speciality_id   speciality_name
1               Acupuncturist
2               Andrologist
3               Anaesthesiologist
4               Audiologist
5               Ayurvedic Doctor
6               Biochemist
7               Biophysicist

我想产生一个如下数据框:

year = [2018]
Month = [1,2]

我无法想到一种方法。正确的做法是什么?

3 个答案:

答案 0 :(得分:2)

对所有组合使用product,并通过左连接创建DataFramemerge

year = [2018]
Month = [1,2]

from  itertools import product

df1 = pd.DataFrame(list(product(year, Month, df['speciality_id'])), 
                   columns=['Year','Month','speciality_id'])
print (df1)
    Year  Month  speciality_id
0   2018      1              1
1   2018      1              2
2   2018      1              3
3   2018      1              4
4   2018      1              5
5   2018      1              6
6   2018      1              7
7   2018      2              1
8   2018      2              2
9   2018      2              3
10  2018      2              4
11  2018      2              5
12  2018      2              6
13  2018      2              7

df = df1.merge(df, on='speciality_id', how='left')
print (df)
    Year  Month  speciality_id    speciality_name
0   2018      1              1      Acupuncturist
1   2018      1              2        Andrologist
2   2018      1              3  Anaesthesiologist
3   2018      1              4        Audiologist
4   2018      1              5   Ayurvedic Doctor
5   2018      1              6         Biochemist
6   2018      1              7       Biophysicist
7   2018      2              1      Acupuncturist
8   2018      2              2        Andrologist
9   2018      2              3  Anaesthesiologist
10  2018      2              4        Audiologist
11  2018      2              5   Ayurvedic Doctor
12  2018      2              6         Biochemist
13  2018      2              7       Biophysicist

答案 1 :(得分:0)

您可以通过pd.MultiIndex.from_product计算笛卡尔积,然后与平铺的数据框合并:

year = [2018]
month = [1, 2]

# calculate Cartesian product and repeat by number of rows in dataframe
cart_prod = pd.MultiIndex.from_product([year, month], names=['year', 'month'])

# tile dataframe and join year_month index
res = df.loc[np.tile(df.index, len(year) * len(month))]\
        .set_index(cart_prod.repeat(df.shape[0])).reset_index()

print(res)

    year  month  speciality_id    speciality_name
0   2018      1              1      Acupuncturist
1   2018      1              2        Andrologist
2   2018      1              3  Anaesthesiologist
3   2018      1              4        Audiologist
4   2018      1              5    AyurvedicDoctor
5   2018      1              6         Biochemist
6   2018      1              7       Biophysicist
7   2018      2              1      Acupuncturist
8   2018      2              2        Andrologist
9   2018      2              3  Anaesthesiologist
10  2018      2              4        Audiologist
11  2018      2              5    AyurvedicDoctor
12  2018      2              6         Biochemist
13  2018      2              7       Biophysicist

答案 2 :(得分:0)

我希望可以帮助您。

# A: Create the new columns
df['Year'], df['Month'] = 2018, None 

# A: Create the two new DataFrame
df1 = df.copy()
df2 = df.copy()

# A: Edith the month in both DataFrames
df1['Month'], df2['Month'] = 1, 2