如何在pandas中进行数据帧的交集

时间:2018-06-04 12:08:18

标签: python pandas dataframe data-analysis

我有一个如下数据框:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th></th>      <th>Title</th>      <th>ASIN</th>      <th>State</th>      <th>SellerSKU</th>      <th>Quantity</th>      <th>FBAStock</th>      <th>QuantityToShip</th>    </tr>  </thead>  <tbody>    <tr>      <th>1</th>      <td>Daedal crafters- Pack of Two Gajra (Orange and...</td>      <td>B075T64ZWJ</td>      <td>WEST BENGAL</td>      <td>DC216</td>      <td>1</td>      <td>0</td>      <td>1</td>    </tr>    <tr>      <th>2</th>      <td>Daedal Dream Catchers - Intricate Web Design(B...</td>      <td>B06XBRRYVK</td>      <td>KARNATAKA</td>      <td>DDC63BB</td>      <td>1</td>      <td>24</td>      <td>0</td>    </tr>    <tr>      <th>3</th>      <td>Daedal Dream Catchers- Blue and White Four Rin...</td>      <td>B07428QBJ9</td>      <td>MAHARASHTRA</td>      <td>12-16RT-1H8B</td>      <td>1</td>      <td>4</td>      <td>0</td>    </tr>    <tr>      <th>4</th>      <td>Daedal dream catchers- Crescent wine DDC21</td>      <td>B01DI70P9W</td>      <td>UTTAR PRADESH</td>      <td>70-PK4Z-6VSP</td>      <td>1</td>      <td>10</td>      <td>0</td>    </tr>  </tbody></table>

列是:

Title   ASIN    State   SellerSKU   Quantity    FBAStock    QuantityToShip 

我有另一个数据框,其中包含上述数据框的行子集,但只有“数量”列在此数据框中更改并且具有列

ASIN State Quantity

如何将这个较小的数据帧与第一个数据帧相交或合并,使得较小数据帧的数量通过匹配ASIN和State列来覆盖原始数据帧数量?

如果可以通过合并完成,怎么办?我不熟悉SQL合并词,如'inner','left'等...

目的:

我正在修改这样的原始DF:

new = originalDF.groupby(['State' ,'ASIN' , 'Quantity']).size().reset_index().rename(columns= {0 : 'Count'})

new.Quantity = new[['Quantity' , 'Count']].apply(lambda tup : tup[0]*tup[1] , axis = 1)
new.drop(['Count'] , axis =1 , inplace=True)

现在我想将originalDF的列放到与新的DF列匹配的新DF和新DF的状态(新DF的数量列是我想要的最终数据帧)。

1 个答案:

答案 0 :(得分:1)

我认为size每个群组Quantity需要transform*=多个列originalDF = pd.DataFrame({'State':list('aaabbb'), 'ASIN':list('cfcccc'), 'Quantity':[100] * 6}) originalDF['Quantity'] *= (originalDF.groupby(['State' ,'ASIN' , 'Quantity'])['State'] .transform('size')) print (originalDF) State ASIN Quantity 0 a c 200 1 a f 100 2 a c 200 3 b c 300 4 b c 300 5 b c 300 所需的print ((originalDF.groupby(['State' ,'ASIN' , 'Quantity'])['State'] .transform('size'))) 0 2 1 1 2 2 3 3 4 3 5 3 Name: State, dtype: int64

var menu = document.getElementById('menu')
(window.pageYOffset > 10)
    ? menu.classList.add('scrolled')
    : menu.classList.remove('scrolled')

<强>详细

var menu = document.getElementById('menu')(window.pageYOffset > 10)
    ? menu.classList.add('scrolled')
    : menu.classList.remove('scrolled');