将两个具有相同维度并重复行的不同数据框相乘

时间:2020-03-19 15:44:44

标签: python-3.x pandas dataframe

我正在尝试将两个数据帧相乘

Df1

Name|Key |100|101|102|103|104
Abb   AB   2   6  10   5   1
Bcc   BC   1   3   7   4   2
Abb   AB   5   1  11   3   1
Bcc   BC   7   1   4   5   0

Df2

Key_1|100|101|102|103|104
AB     10  2   1   5   1
BC     1  10   2   2   4

预期产量

Name|Key |100|101|102|103|104
Abb   AB  20  12   10  25   1
Bcc   BC   1  30   14   8   8
Abb   AB  50   2   11  15  1
Bcc   BC   7  10   8   10   0

我尝试将Df1分组,然后与Df2相乘,但是没有用 请帮助我解决该问题

3 个答案:

答案 0 :(得分:4)

您可以将df2 rename Key_1设为Key(类似于df1),然后在level=1上设置索引和mul

df1.set_index(['Name','Key']).mul(df2.rename(columns={'Key_1':'Key'})
                              .set_index('Key'),level=1).reset_index()

或类似的

df1.set_index(['Name','Key']).mul(df2.set_index('Key_1')
   .rename_axis('Key'),level=1).reset_index()

@QuangHoang正确指出,您也可以重命名:

df1.set_index(['Name','Key']).mul(df2.set_index('Key_1'),level=1).reset_index()

  Name Key  100  101  102  103  104
0  Abb  AB   20   12   10   25    1
1  Bcc  BC    1   30   14    8    8
2  Abb  AB   50    2   11   15    1
3  Bcc  BC    7   10    8   10    0

答案 1 :(得分:3)

IIUC reindex_like

df1.set_index('Key',inplace=True)
df1=df1.mul(df2.set_index('Key_1').reindex_like(df1).values).fillna(df1)
Out[235]: 
    Name   100   101   102   103  104
Key                                  
AB   Abb  20.0  12.0  10.0  25.0  1.0
BC   Bcc   1.0  30.0  14.0   8.0  8.0
AB   Abb  50.0   2.0  11.0  15.0  1.0
BC   Bcc   7.0  10.0   8.0  10.0  0.0

答案 2 :(得分:3)

我们也可以将DataFrame.mergepd.Index.difference一起使用来选择列。

raw_data <- getURL("https://raw.githubusercontent.com/datasets/covid-19/master/time-series-19- 
            covid-combined.csv")
data <- read.csv(text = raw_data, stringsAsFactors = FALSE)
View(data)

Confirmed <- data[which(data$Date=="2020-03-18"),] %>%
  group_by(Country.Region)%>%
  summarise(Confirmed = sum(Confirmed)) %>%
  arrange(-Confirmed)
View(Confirmed)

Deaths <- data[which(data$Date=="2020-03-18"),] %>%
  group_by(Country.Region) %>%
  summarise(Deaths = sum(Deaths)) %>%
  arrange(-Deaths)
View(Deaths)

Recovered <- data[which(data$Date=="2020-03-18"),] %>%
  group_by(Country.Region) %>%
  summarise(Recovered = sum(Recovered)) %>%
  arrange(-Recovered)
View(Recovered)

Total_Confirmed <- sum(Confirmed$Confirmed)
Total_Deaths <- sum(Deaths$Deaths)
Total_Recovered <- sum(Recovered$Recovered)

mul_cols = df1.columns.difference(['Name','Key'])                                                                                
df1.assign(**df1[mul_cols].mul(df2.merge(df1[['Key']],
                                             left_on = 'Key_1',
                                             right_on = 'Key')[mul_cols]))