ValueError:必须仅使用布尔值传递DataFrame

时间:2017-02-14 13:29:28

标签: python database pandas data-science

问题

在这个数据文件中,使用" REGION"将美国划分为四个区域。柱。

创建一个查询,查找属于区域1或2的县,其名称以'华盛顿'开头,其POPESTIMATE2015大于2014年的POPESTIMATE。

此函数应返回5x2 DataFrame,其中columns = [' STNAME',' CTYNAME']和与census_df相同的索引ID(按索引递增排序)

CODE

    def answer_eight():
    counties=census_df[census_df['SUMLEV']==50]
    regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]
    washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]
    grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]
    return grew[grew['STNAME'],grew['COUNTY']]

outcome = answer_eight()
assert outcome.shape == (5,2)
assert list (outcome.columns)== ['STNAME','CTYNAME']
print(tabulate(outcome, headers=["index"]+list(outcome.columns),tablefmt="orgtbl"))

错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-77-546e58ae1c85> in <module>()
      6     return grew[grew['STNAME'],grew['COUNTY']]
      7 
----> 8 outcome = answer_eight()
      9 assert outcome.shape == (5,2)
     10 assert list (outcome.columns)== ['STNAME','CTYNAME']

<ipython-input-77-546e58ae1c85> in answer_eight()
      1 def answer_eight():
      2     counties=census_df[census_df['SUMLEV']==50]
----> 3     regions = counties[(counties[counties['REGION']==1]) | (counties[counties['REGION']==2])]
      4     washingtons = regions[regions[regions['COUNTY']].str.startswith("Washington")]
      5     grew = washingtons[washingtons[washingtons['POPESTIMATE2015']]>washingtons[washingtons['POPESTIMATES2014']]]

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1991             return self._getitem_array(key)
   1992         elif isinstance(key, DataFrame):
-> 1993             return self._getitem_frame(key)
   1994         elif is_mi_columns:
   1995             return self._getitem_multilevel(key)

/opt/conda/lib/python3.5/site-packages/pandas/core/frame.py in _getitem_frame(self, key)
   2066     def _getitem_frame(self, key):
   2067         if key.values.size and not com.is_bool_dtype(key.values):
-> 2068             raise ValueError('Must pass DataFrame with boolean values only')
   2069         return self.where(key)
   2070 

ValueError: Must pass DataFrame with boolean values only

我很无能为力。我哪里错了?

由于

6 个答案:

答案 0 :(得分:5)

你试图使用不同形状的df掩盖你的df,这是错误的,另外你正在使用不正确的方式。当您将df中的列或系列与标量进行比较以生成布尔掩码时,您应该只传递条件,而不是连续使用它。

\n

你想要:

   series : [
    {
        name: 'pie-chart',
        type: 'pie',
        selectedMode: 'single',
        radius: ['50%', '60%'],
        data:[
            {value:5, name:'Institutionelle Investoren\nRest der Welt: 5 %'},
            {value:39, name:'Institutionelle Investoren\nEuropa\n(ohne\nDeutsch-\nland): 39 %'},
            {value:31, name:'Institutionelle\nInvestoren\nUSA: 31 %'},
            {value:17, name:'Institutionelle\nInvestoren\nDeutsch-\nland: 17 %'},
            {value:8, name:'Privatanleger & nicht näher\nbekannte Investoren: 8 %'}             

        ],
...

答案 1 :(得分:0)

def answer_eight():
    df=census_df[census_df['SUMLEV']==50]
    #df=census_df
    df=df[(df['REGION']==1) | (df['REGION']==2)]
    df=df[df['CTYNAME'].str.startswith('Washington')]
    df=df[df['POPESTIMATE2015'] > df['POPESTIMATE2014']]
    df=df[['STNAME','CTYNAME']]
    print(df.shape)
    return df.head(5)

答案 2 :(得分:0)


def answer_eight():
    county = census_df[census_df['SUMLEV']==50]
    req_col = ['STNAME','CTYNAME']

    region = county[(county['REGION']<3) & (county['POPESTIMATE2015']>county['POPESTIMATE2014']) & (county['CTYNAME'].str.startswith('Washington'))]
    region = region[req_col]

    return region
answer_eight()

答案 3 :(得分:0)

def answer_eight():
    df=census_df
    region1=df[ df['REGION']==1 ]
    region2=df[ df['REGION']==2 ]

    yes_1=region1[ region1['POPESTIMATE2015'] > region1['POPESTIMATE2014']]
    yes_2=region2[ region2['POPESTIMATE2015'] > region2['POPESTIMATE2014']]

    yes_1=yes_1[ yes_1['CTYNAME']=='Washington County' ]
    yes_2=yes_2[ yes_2['CTYNAME']=='Washington County' ]

    ans=yes_1[ ['STNAME','CTYNAME'] ]  
    ans=ans.append(yes_2[ ['STNAME','CTYNAME'] ])
    return ans.sort()

答案 4 :(得分:0)

我在Coursera这样解决了这个问题。

def answer_eight():
    df8 = census_df.copy()
    washington = df8['CTYNAME'].str[0:10] == 'Washington'
    popincrease = df8['POPESTIMATE2015']) > (df8['POPESTIMATE2014']
    region = (df8['REGION'] == 1) | (df8['REGION'] == 2)
    
df8 = df8[region & popincrease & washington]

    return df8[{'STNAME','CTYNAME'}]

answer_eight()

那时我是熊猫的初学者,这花了我近20个LOL。

答案 5 :(得分:0)

我以这种方式解决了这个问题(我没有在一行中使用直接访问census_df的任何Local变量) 解决方案与您看到的其他解决方案几乎相同,但是在其他解决方案中,它们在我的解决方案中使用了局部变量,而我没有使用它。

def answer_eight(): 
    return census_df[
          (census_df['SUMLEV'] == 50)                                     &
          ((census_df["REGION"] == 1) | (census_df["REGION"] == 2))       &
          (census_df["CTYNAME"].str.lower()).str.startswith('washington') &
          (census_df["POPESTIMATE2015"] > census_df["POPESTIMATE2014"])        
         ][["STNAME","CTYNAME"]]