我有2 df
RangeIndex: 47504 entries, 0 to 47503
Data columns (total 9 columns):
ID_TA 47504 non-null object
special_city_rank 35554 non-null object
city_rank 42614 non-null object
TA_num_rew 47366 non-null object
TA_price_range 43202 non-null object
TA_cusine_style 43202 non-null object
services 29250 non-null object
category_marks 47504 non-null object
en_rew_marks 47504 non-null object
dtypes: object(9)
memory usage: 3.3+ MB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 11 columns):
Restaurant_id 50000 non-null object
City 50000 non-null object
Cuisine Style 38410 non-null object
Ranking 50000 non-null float64
Price Range 32639 non-null object
Number of Reviews 46800 non-null float64
Reviews 49998 non-null object
URL_TA 50000 non-null object
ID_TA 50000 non-null object
sample 50000 non-null int64
Rating 50000 non-null float64
dtypes: float64(3), int64(1), object(7)
memory usage: 4.2+ MB
They have column ID_TA (same)
我使用命令
data.merge(data_TA, how="left", on = "ID_TA")
和结果DF len = 73780
<class 'pandas.core.frame.DataFrame'>
Int64Index: 73780 entries, 0 to 73779
Data columns (total 19 columns):
Restaurant_id 73780 non-null object
City 73780 non-null object
Cuisine Style 56721 non-null object
Ranking 73780 non-null float64
Price Range 48250 non-null object
Number of Reviews 69082 non-null float64
Reviews 73778 non-null object
URL_TA 73780 non-null object
ID_TA 73780 non-null object
sample 73780 non-null int64
Rating 73780 non-null float64
special_city_rank 35604 non-null object
city_rank 42668 non-null object
TA_num_rew 47422 non-null object
TA_price_range 43256 non-null object
TA_cusine_style 43256 non-null object
services 29296 non-null object
category_marks 47560 non-null object
en_rew_marks 47560 non-null object
dtypes: float64(3), int64(1), object(15)
memory usage: 11.3+ MB
我不明白怎么办? ))
必须为50000