如何在一个循环中过滤多个数据帧?

时间:2020-05-14 06:35:19

标签: python pandas dataframe

我有很多数据框,我想对所有这些框应用相同的过滤器,而不必每次都复制粘贴过滤器条件。

到目前为止,这是我的代码:

df_list_2019 = [df_spain_2019,df_amsterdam_2019, df_venice_2019, df_sicily_2019]

for data in df_list_2019:
    data = data[['host_since','host_response_time','host_response_rate',
             'host_acceptance_rate','host_is_superhost','host_total_listings_count',
              'host_has_profile_pic','host_identity_verified',
             'neighbourhood','neighbourhood_cleansed','zipcode','latitude','longitude','property_type','room_type',
             'accommodates','bathrooms','bedrooms','beds','amenities','price','weekly_price',
             'monthly_price','cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights',
             'minimum_nights_avg_ntm','has_availability','availability_30','availability_60','availability_90',
              'availability_365','number_of_reviews','number_of_reviews_ltm','review_scores_rating',
              'review_scores_checkin','review_scores_communication','review_scores_location', 'review_scores_value',
              'instant_bookable','is_business_travel_ready','cancellation_policy','reviews_per_month'
             ]]

,但不会将过滤器应用于数据框。如何更改代码来做到这一点?

谢谢

2 个答案:

答案 0 :(得分:1)

实际上将过滤器(列选择)应用于每个DataFrame,您只需覆盖名称render()所指向的内容就可以丢弃结果。

您需要将结果存储在某个地方,例如列表。

data

答案 1 :(得分:0)

var = new_value后,您不会更改原始对象,而是拥有引用新对象的变量。

如果要更改df_list_2019中的数据帧,则必须使用inplace=True方法。在这里,您可以使用drop

keep = set(['host_since','host_response_time','host_response_rate',
             'host_acceptance_rate','host_is_superhost','host_total_listings_count',
              'host_has_profile_pic','host_identity_verified',
             'neighbourhood','neighbourhood_cleansed','zipcode','latitude','longitude','property_type','room_type',
             'accommodates','bathrooms','bedrooms','beds','amenities','price','weekly_price',
             'monthly_price','cleaning_fee','guests_included','extra_people','minimum_nights','maximum_nights',
             'minimum_nights_avg_ntm','has_availability','availability_30','availability_60','availability_90',
              'availability_365','number_of_reviews','number_of_reviews_ltm','review_scores_rating',
              'review_scores_checkin','review_scores_communication','review_scores_location', 'review_scores_value',
              'instant_bookable','is_business_travel_ready','cancellation_policy','reviews_per_month'
             ])

for data in df_list_2019:
    data.drop(columns=[col for col in data.columns if col not in keep], inplace=True)

但是请注意,熊猫专家建议使用df = df. ...惯用语而不是df...(..., inplace=True),因为它允许链接操作。因此,您应该问自己是否无法使用@timgeb's answer。无论如何,这应该可以满足您的要求。