Python错误我无法解决:系列的值不明确

时间:2017-07-26 14:39:42

标签: python python-3.x pandas

我有一个名为result的数据框,其中包含date1date2列。正如您所见,正在创建所有其他列。

我想要的是根据列date_diff的信息创建三列。一个被称为"不到6天"使用1或0表示date_diff中的元素在0到6之间。其他列遵循相同的逻辑,名称为" 7-21天"和" 22天以上"。

result['date_diff'] = result['date2'] - result['date1']
result['date_diff'] = result['date_diff'].dt.days
pd.to_numeric(result['date_diff'])


def menos_6dias(result):
    if 0 <= result['date_diff'] <= 6:
        return 1
    else:
        return 0

result['Pending < 6 days'] = result.apply(menos_6dias, axis=1)

def de_7_a_21dias(teste):
    if 7 <= result['date_diff'] <= 21:
        return 1
    else:
        return 0

result['7-21 days'] = result.apply(de_7_a_21dias, axis=1)

def mais_de_22dias(result):
    if result['date_diff'] >= 22:
        return 1
    else:
        return 0

result['22+ days'] = result.apply(mais_de_22dias, axis=1)

result.head()

我认为有一个错误是由于列date_diff的数据类型造成的。因此,我尝试使用.dt.dayspd.to_numeric,但这并没有奏效。错误是:

ValueError                                Traceback (most recent call last)
<ipython-input-34-78fa25211501> in <module>()
     18         return 0
     19 
---> 20 result['7-21 days'] = result.apply(de_7_a_21dias, axis=1)
     21 
     22 def mais_de_22dias(result):

/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   4358                         f, axis,
   4359                         reduce=reduce,
-> 4360                         ignore_failures=ignore_failures)
   4361             else:
   4362                 return self._apply_broadcast(f, axis)

/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4454             try:
   4455                 for i, v in enumerate(series_gen):
-> 4456                     results[i] = func(v)
   4457                     keys.append(v.name)
   4458             except Exception as e:

<ipython-input-34-78fa25211501> in de_7_a_21dias(teste)
     13 
     14 def de_7_a_21dias(teste):
---> 15     if 7 <= result['dias pendentes na acao'] <= 21:
     16         return 1
     17     else:

/Users/elachmann/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
    951         raise ValueError("The truth value of a {0} is ambiguous. "
    952                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 953                          .format(self.__class__.__name__))
    954 
    955     __bool__ = __nonzero__

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

以下是我的数据框的列标题:合同ID ||名字||电子邮件||公司||公司州|| user.active || contract.active || date1 || date2 ||待定||已回答||拒绝了||取消了||不活跃||总要求|| fb rq id ||辅助

1 个答案:

答案 0 :(得分:1)

考虑以下DataFrame,hd

            beer_servings
country
Armenia                21
Bulgaria              231
Cuba                   93
France                127
Iran                    0
Libya                   0
Mozambique             47
Peru                  163
Serbia                283
Thailand               99
Vanuatu                21

您可能知道与Pandas列的比较会为您提供一列布尔值。

In [54]: pd.to_numeric(hd['beer_servings'] < 50)
Out[54]:
country
Armenia        True
Bulgaria      False
Cuba          False
France        False
Iran           True
Libya          True
Mozambique     True
Peru          False
Serbia        False
Thailand      False
Vanuatu        True
Name: beer_servings, dtype: bool

您可能不知道该系列有一个astype方法,可以将布尔列转换为整数。

In [57]: (hd['beer_servings'] < 50).astype(int)
Out[57]:
country
Armenia       1
Bulgaria      0
Cuba          0
France        0
Iran          1
Libya         1
Mozambique    1
Peru          0
Serbia        0
Thailand      0
Vanuatu       1
Name: beer_servings, dtype: int64

我认为你已经展示了足够的熊猫知识从那里开始,但需要注意的是0 < df['column'] < 12之类的比较不起作用,必须重新制作为(df['column'] > 0) & (df['column'] < 12)或类似。< / p>