以下是数据框:
CNSSSBDVSN CNSSSBDVS1 CNMCRGNNM \
0 5941833 Kluskus 1 Cariboo
1 5949832 Iskut 6 North Coast / Cote-nord
2 5941016 Cariboo H Cariboo
3 5955040 Peace River B Northeast / Nord-est
4 5941801 Alkali Lake 1 Cariboo
CNSSSBDVS3 instagram_posts airports \
0 Indian Reserve 0 0
1 Indian Reserve 0 0
2 Regional District Electoral Area 0 0
3 Regional District Electoral Area 1 17
4 Indian Reserve 0 0
railway_stations accommodations visitor_centers festivals \
0 0 0 0 0
1 0 0 0 0
2 0 5 0 0
3 11 0 0 0
4 0 0 0 0
ports_and_ferry_terminals attractions
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
以下是代码。在你阅读它之前,我想提两点:1。我认为残差或索引有问题 2.如果需要,CNSSSBDVSN可用作指数
# -*- coding: utf-8 -*-
import pandas as pd
import statsmodels.formula.api as sm
import matplotlib.pyplot as plt
import scipy.stats as stats
from tabulate import tabulate
if __name__ == "__main__":
# Read data
census_subdivision_without_lower_mainland_and_van_island = pd.read_csv('../data/augmented/census_subdivision_without_lower_mainland_and_van_island.csv')
# Select data
cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City']
non_cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] != 'City']
# Fit
fit_cities = sm.ols(formula="instagram_posts ~ airports + railway_stations + ports_and_ferry_terminals + accommodations + visitor_centers + festivals + attractions", data=cities).fit()
fit_non_cities = sm.ols(formula="instagram_posts ~ airports + railway_stations + ports_and_ferry_terminals + accommodations + visitor_centers + festivals + attractions", data=non_cities).fit()
print(fit_cities.summary())
print(fit_non_cities.summary())
# Residual
cities['residual'] = fit_cities.resid
non_cities['residual'] = fit_non_cities.resid
给出错误:
/Users/Chu/Documents/dssg/done/linear_model_cities.py:27: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
cities['residual'] = fit_cities.resid
/Users/Chu/Documents/dssg/done/linear_model_cities.py:28: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
non_cities['residual'] = fit_non_cities.resid
答案 0 :(得分:0)
你的问题是城市是census_subdivision_without_lower_mainland_and_van_island的一部分 如果你想从这里使用城市作为自己的数据框,你可以创建一个副本:
cities = census_subdivision_without_lower_mainland_and_van_island[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City'].copy()
或者,如果您希望修改原始数据帧,可以使用loc插入结果作为上述错误:
census_subdivision_without_lower_mainland_and_van_island.loc[census_subdivision_without_lower_mainland_and_van_island['CNSSSBDVS3'] == 'City','residuals'] = fit_cities.resid
同样适用于非城市。作为一个假设,我使用较短的数据帧名称,以保持您的代码可读并在推荐的python行限制内