使用numpy根据多个where条件更新数据帧值

时间:2017-07-09 11:09:20

标签: python numpy dataframe

我想根据多个条件更改DateWork['Variable']值,并在DateWork['Date']更新

如果Frequency=3len(Variable)=6,则用" -0"替换M;并在DateWork['Date']中更新 如果Frequency=3len(Variable)=7则将&M替换为" - "并在DateWork['Date']

中更新

DataFrame:DateWork

Frequency Variable      Date
3         1950M2        1950-02-01
3         1950M3        1950-03-01
2         1950-07-01    1950-07-01
3         1950M9        1950-09-01
2         1950-10-01    1950-10-01
3         1950M10       1950-10-01

我的代码:

DateWork.loc[DateWork['Date']] = np.where(((DateWork['Frequency'] == 3) & (DateWork['variable'].str.len() == 6)), 'M', '-0',  DateWork['Date'])
DateWork.loc[DateWork['Date']] = np.where(((DateWork['Frequency'] == 3) & (DateWork['variable'].str.len() == 7)), 'M', '-',  DateWork['Date'])
DateWork.loc[DateWork['Frequency'] == 3, 'Date'] = DateWork.loc[DateWork['Frequency'] == 3, 'variable'] + '-01'

这会出错:

  

TypeError:where()最多需要3个参数(给定4个)

2 个答案:

答案 0 :(得分:2)

您提出的错误是因为您将一个额外的参数传递给np.where,您可以查看有关此方法的文档,链接如下。同样,这个问题得到修复,你编写代码的方式使得最后np.where调用更新并替换之前的所有调用,因此它们需要“嵌套”才能正常工作。

如果您提出要求,我还提供了一个没有np.where的解决方案。

numpy.where的解决方案:

# where frequenct == 3 and len(variable) == 6, we put variable and replace M with -0, if that's not
# the case, we search where frequency == 3 and len(variable) == 7 and put variable while replacing M with -
# else we just put Variable
DateWork['Date'] = np.where((DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 6), DateWork['Variable'].str.replace('M','-0'),
                       np.where((DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 7), DateWork['Variable'].str.replace('M','-'), DateWork['Variable']))

# we add first day date where frequency == 3
DateWork.loc[DateWork['Frequency'] == 3, 'Date'] = DateWork.loc[DateWork['Frequency'] == 3, 'Date'] + '-01'

pandas.dataframe.loc的解决方案:

# where frenquency == 3 and len(variable) == 6, in date we put variable and replace M with -0
DateWork.loc[(DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 6),'Date'] = DateWork['Variable'].str.replace('M','-0')

# where frequency == 3 and len(variable) == 7, in date we put variable and replace M with -
DateWork.loc[(DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 7),'Date'] = DateWork['Variable'].str.replace('M','-')

# where frequency == 2, in date we simply put variable
DateWork.loc[DateWork['Frequency'] == 2,'Date'] = DateWork['Variable']

# where frequency == 3, in date we add first day date.
DateWork.loc[DateWork['Frequency'] == 3, 'Date'] = DateWork.loc[DateWork['Frequency'] == 3, 'Date'] + '-01'

答案 1 :(得分:0)

如果难以阅读嵌套np.where

DateWork
Out[32]: 
   Frequency    Variable        Date
0          3      1950M2  1950-02-01
1          3      1950M3  1950-03-01
2          2  1950-07-01  1950-07-01
3          3      1950M9  1950-09-01
4          2  1950-10-01  1950-10-01
5          3     1950M10  1950-10-01

首先是:

其他条件是原始Date列本身

DateWork['Date'] = np.where((DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 6), DateWork['Variable'].str.replace('M','-0'), DateWork['Date'])

DateWork
Out[34]: 
   Frequency    Variable        Date
0          3      1950M2     1950-02
1          3      1950M3     1950-03
2          2  1950-07-01  1950-07-01
3          3      1950M9     1950-09
4          2  1950-10-01  1950-10-01
5          3     1950M10  1950-10-01

第二个如果:

此处,else条件是上一步的输出date

DateWork['Date'] = np.where((DateWork['Frequency'] == 3) & (DateWork['Variable'].str.len() == 7), DateWork['Variable'].str.replace('M','-'), DateWork['Date'])

DateWork
Out[36]: 
   Frequency    Variable        Date
0          3      1950M2     1950-02
1          3      1950M3     1950-03
2          2  1950-07-01  1950-07-01
3          3      1950M9     1950-09
4          2  1950-10-01  1950-10-01
5          3     1950M10     1950-10