我的数据框是“bus_rev”。我想对数据帧进行子集化,以便我有一个偶数个记录,其中good_reviews == True和good_reviews == False。任何人都可以建议一个光滑的方式来做到这一点吗?
Sample Data:
print(bus_rev[1:3])
user_id business_id stars_x \
1 CxDOIDnH8gp9KXzpBHJYXw XSiqtcVEsP6dLOL7ZA9OxA 4
2 CxDOIDnH8gp9KXzpBHJYXw v95ot_TNwTk1iJ5n56dR0g 3
address attributes \
1 522 Yonge Street {u'BusinessParking': {u'garage': False, u'stre...
2 1661 Denison Street {u'BusinessParking': {u'garage': False, u'stre...
categories city \
1 [Restaurants, Ramen, Japanese] Toronto
2 [Chinese, Seafood, Restaurants] Markham
hours is_open latitude \
1 {u'Monday': u'11:00-22:00', u'Tuesday': u'11:0... 1 43.663689
2 {} 0 43.834295
longitude name neighborhood postal_code \
1 -79.384200 Kenzo Ramen Downtown Core M4Y 1X9
2 -79.305282 Vince Seafood Restaurant & BBQ Milliken L3R 6E4
review_count stars_y state good_reviews
1 76 3.5 ON True
2 23 3.5 ON False
Code:
bus_rev['good_reviews'].value_counts()
Output:
False 482
True 168
Name: good_reviews, dtype: int64
答案 0 :(得分:1)
要创建具有相等值的DataFrame,您可以使用:
bus_revs_false = bus_revs[bus_revs['good_reviews'] == False]
bus_revs_false = bus_revs_false.iloc(:168,:)
bus_revs_true = bus_revs[bus_revs['good_reviews'] == True]
bus_revs_new = bus_revs_true.append(bus_revs_false)
在这种情况下,bus_revs_new将是你的新数据框架,具有相同数量的Trues和Falses。
答案 1 :(得分:1)
要获得相同数量的真实和法力,你可以这样做:
<table width="90%" border="1" id="TestAlert">
<tbody>
<tr>
<td>Attribute</td>
<td>Thai</td>
<td>English</td>
</tr>
<tr>
<td>น้ำหนัก</td>
<td>...</td>
<td>WT</td>
</tr>
<tr>
<td>ชื่อรุ่น</td>
<td>...</td>
<td>MDL</td>
</tr>
<tr>
<td>สูง</td>
<td>...</td>
<td>HIGH</td>
</tr>
</tbody>
</table>