将函数应用于数据框列

时间:2019-01-06 01:41:24

标签: python pandas

我正在尝试将函数应用于数据框的列,并且它始终引发错误。我需要你的帮助。
该函数假定要删除不包含数组keywordz中所有项目的行。

  

功能»

def get_restuarant_business(data):
    keywordz=['food','restuarant','bakery','deli','fast', 
                  'food','bars','coffee']

    data=data.lower()
    while((data != '' or pd.isnull(data)==False ) and isinstance(data, 
    str)):  
       flag= False
       for i in keywordz:
          if i in data:
             flag=True
             break
          else:
             continue
    return flag

rest_biz = business.copy().loc[business['categories'].head(1).apply(
                                     get_restuarant_business) == True]

这是引发的异常。

----------------------------------------------------------------------- 
----
TypeError                                 Traceback (most recent call 
last)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in 
pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-13-8da5e44c6072> in <module>()
1 print(business.head(5))
----> 2 business['categories'].apply(get_restuarant_business)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py
 in __getitem__(self, key)
764         key = com._apply_if_callable(key, self)
765         try:
766             result = self.index.get_value(self, key)
767 
768             if not is_scalar(result):

~\AppData\Local\Continuum\anaconda3\lib\site- 
packages\pandas\core\indexes\base.py in get_value(self, series, key)
3101         try:
3102             return self._engine.get_value(s, k,
3103                                           tz=getattr(series.dtype, 'tz', None))
3104         except KeyError as e1:
3105             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

 pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'categories'
0    'tours, breweries, pizza, restaurants, food, h...
1    'chicken wings, burgers, caterers, street vend...
2    'breakfast & brunch, restaurants, french, sand...
3    'home & garden, nurseries & gardening, shoppin...
4                                 'coffee & tea, food'
 Name: categories, dtype: object

能帮我吗?

2 个答案:

答案 0 :(得分:0)

我认为下面的功能可以解决您的目的

def get_restuarant_business(data):
    keywordz=['food','restuarant','bakery','deli','fast food','bars','coffee']

    data=data.lower()
    flag= False
    if data in keywordz:
        flag= True

    return flag

称呼这个

business_df['food_cat'] = business_df['categories'].apply(
    get_restuarant_business)

过滤器,其中u为真

答案 1 :(得分:0)

尝试一下!

import numpy as np
business = pd.DataFrame({'categories':['tours, breweries, pizza, restaurants, food',
                                        'chicken wings, burgers, caterers, street vend',
                                       'breakfast & brunch, restaurants, french, sand',
                                       'home & garden, nurseries & gardening, shopping']})

keywordz=['food','restaurants','bakery','deli','fast','food','bars','coffee']

rest_biz = business[business['categories'].apply(lambda x: np.any([True if w.lower() in keywordz else False for w in x.split(', ')]))]

# output
    categories
0   tours, breweries, pizza, restaurants, food