Question

我想使用groupby对最初看起来像这样的数据生成一些权重：

NSError *error;
NSString *string = [NSMutableString stringWithString:[[_postData valueForKey:@"content"] valueForKey:@"rendered"]];

NSScanner *theScanner;
NSString *text = nil;
theScanner = [NSScanner scannerWithString:string];
while ([theScanner isAtEnd] == NO) {
    // find start of tag
    [theScanner scanUpToString:@"[" intoString:NULL];
    // find end of tag
    [theScanner scanUpToString:@"]" intoString:&text];
    // replace the found tag with a space
    //(you can filter multi-spaces out later if you wish)
    string = [string stringByReplacingOccurrencesOfString:[NSString 
    stringWithFormat:@"%@]",text] withString:@""];
 }

基本上，当MONTH与M1不同时，我希望标记的选项的权重等于任何未标记的选项的两倍。
示例：如果您有（C1，C2，C3）并且C1是唯一标记的对象，则权重将为：0.5 / 0.25 / 0.25。

同时，对于第一个月，我希望权重完全集中在标记的选项上。上一个示例将变为（1/0/0）。

关于数据的精度：
对于给定的元组（V1，V2，MONTH），我们最多可以将两个选项标记为优先级（根本没有优先级）。

这就是我尝试过的：

V1   V2   MONTH  CHOICES  PRIORITY
X    T1   M1     C1       1
X    T1   M1     C2       0
X    T1   M1     C3       0
X    T2   M1     C1       1
X    T2   M1     C5       0
X    T2   M1     C6       0
X    T2   M1     C2       1
X    T1   M2     C1       1
X    T1   M2     C2       0
X    T1   M2     C3       0
X    T2   M2     C1       0
X    T2   M2     C5       1
X    T2   M2     C6       0
X    T2   M2     C2       1

问题在于，由于我对“ MONTH”进行了分组，因此似乎值不再出现在应用了“ weights_preferences”的数据中。

P.S：输出看起来像这样

def weights_preferences(data):
     if (data.MONTH.values != 'M1'):
         data['WEIGHTS'] = 1/(len(data)+data[data.PRIORITY==1].shape[0])
         data['WEIGHTS'] = data.apply(lambda x : 2*x.WEIGHTS if x.PRIORITY==1 else x.WEIGHTS, axis=1)
     elif data.MONTH.values == 'M1' & data[data.PRIORITY==1].shape[0]==0 :
         data['WEIGHTS'] = 1/(len(data))
     else :
         if data[data.PREFERENCE==1].shape[0]==1 :
             data['WEIGHTS'] = [1 if x[1].PRIORITY==1 else 0 for x in data.iterrows()]
         else :
             data['WEIGHTS'] = [0.5 if x[1].PRIORITY==1 else 0 for x in data.iterrows()]
     return data

tmp = tmp.groupby(['V1','V2','MONTH']).apply(weights_preferences)

任何建议都非常受欢迎！

谢谢。

自定义函数+通过变量分组的条件不同的groupby熊猫

0 个答案: