熊猫:替代iterrow循环

时间:2017-12-24 19:49:44

标签: python pandas numpy dataframe

我有一个小函数我在pandas中运行,当我运行if x in y语句时抛出一个ValueError。我看到了类似的问题,建议使用布尔索引,.isin()where(),但我无法根据我的情况调整任何示例。任何建议都将非常感谢。

附加说明:groups是包含数据框外字符串的列表。我对该函数的目标是查看数据框中的项目所在的列表,然后返回该列表的索引。我在下面的笔记本链接中的第一个版本使用iterrows来循环数据框,但我知道在大多数情况下这是次优的。

Jupyter笔记本上有一些假数据:https://github.com/amoebahlan61/sturdy-chainsaw/blob/master/Grouping%20Test_1.1.ipynb

谢谢!

代码:

def groupFinder(item):
    for group in groups:
        if item in group:
            return groups.index(group)

df['groupID2'] = groupFinder(df['item'])


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-808ac3e51e1f> in <module>()
      4             return groups.index(group)
      5 
----> 6 df['groupID2'] = groupFinder(df['item'])

<ipython-input-16-808ac3e51e1f> in groupFinder(item)
      1 def groupFinder(item):
      2     for group in groups:
----> 3         if item in group:
      4             return groups.index(group)
      5 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    953         raise ValueError("The truth value of a {0} is ambiguous. "
    954                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955                          .format(self.__class__.__name__))
    956 
    957     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

解决方案 我发现了一些pandas博客文章,并从reddit用户那里得到了一些反馈,这给了我一个使用pandas&#39;跳过使用iterrows的解决方案。 apply功能。

df['groupID2'] = df.item.apply(groupFinder)

感谢大家的帮助和回应。

3 个答案:

答案 0 :(得分:0)

使用Trace: FacebookErr: ReferenceError: uri is not defined at log (/home/one/github/dolphin/app/error-handler.js:4:11) at Layer.handle_error (/home/one/github/dolphin/node_modules/express/lib/router/layer.js:71:5) at trim_prefix (/home/one/github/dolphin/node_modules/express/lib/router/index.js:315:13) at /home/one/github/dolphin/node_modules/express/lib/router/index.js:284:7 at Function.process_params (/home/one/github/dolphin/node_modules/express/lib/router/index.js:335:12) at Immediate.next (/home/one/github/dolphin/node_modules/express/lib/router/index.js:275:10) at Immediate.<anonymous> (/home/one/github/dolphin/node_modules/express/lib/router/index.js:635:15) at runCallback (timers.js:783:20) at tryOnImmediate (timers.js:743:5) at processImmediate [as _immediateCallback] (timers.js:714:5) Sun Dec 24 2017 13:39:37 GMT-0600 (CST) 的方法是首先调用isin以生成布尔掩码,然后使用此掩码进行索引。或者,要在列表而不是系列中使用您的功能,您可以拨打Series.isin(...)

答案 1 :(得分:0)

IIUC,您可以使用Pandas在几行中完成您想要的任务:

Alt + Enter

现在,要让func yourFuncName() { //this is global var regionHasBeenCentered = false if !self.regionHasBeenCentered { let span: MKCoordinateSpan = MKCoordinateSpanMake(0.01, 0.01) let userLocation: CLLocationCoordinate2D = CLLocationCoordinate2DMake(_cllocationOfUserCurrentLocation!.coordinate.latitude, _cllocationOfUserCurrentLocation!.coordinate.longitude) let region: MKCoordinateRegion = MKCoordinateRegionMake(userLocation, span) self.mapView.setRegion(region, animated: true) self.regionHasBeenCentered = true } self.mapView.showsUserLocation = true } 中的每个项目都在该组中,请调用bool array6(int* array, int len, int index) { if (index == len) { return false; } return array[index] == 6 || array6(array, len, index + 1); } 以查看组ID加项目,或只调用import pandas as pd # create master list of items master = pd.Series(legumesGroup + herbGroup + radishGroup) # assign group id as index master.index = [0]*len(legumesGroup) + [1]*len(herbGroup) + [2]*len(radishGroup) # sample from master with replacement to get itemList itemList = master.sample(n=1000, replace=True)

itemList

输出:

itemList

答案 2 :(得分:0)

解决方案

我遇到了一些熊猫博客文章,并从reddit用户那里得到了一些反馈,这给了我一个解决方案,通过使用pandas&#39;来跳过使用iterrows。应用功能。

df['groupID2'] = df.item.apply(groupFinder)

感谢大家的帮助和回应。