有效地计算和存储相似度矩阵

时间:2015-11-13 23:29:26

标签: python loops matrix pearson

对于类I中的推荐系统项目,我目前正在尝试为具有大约7000个用户(行)和4000个电影(列)的数据集构建和存储基于项目的相似度矩阵。所以我所拥有的是一个数据透视表,其中UserID为索引,MovieID为列,评级为值。你可以想象有很多0评级。

目前我正在使用scipy包中的pearsonr函数。我想,为了存储所有距离,我必须计算所有列之间的皮尔逊系数,并将它们存储在对称的电影 - 电影矩阵中。我的代码到现在为止(你可以看到我是Python /编码的新手):

import pandas as pd
import numpy as np
from scipy.stats import pearsonr

pd.read_csv('data.csv')
data = data.pivot(index = 'UserID', columns = 'MovieID', values = "Rating")

similarity_data = pd.DataFrame(index=data.columns, columns=data.columns)

for i in range(0,len(data.columns)):
    for j in range(0,len(data.columns)):
        similarity_data.iloc[i,j] =  pearsonr(data.iloc[:,i],data.iloc[:,j])[0]

嗯,你可以想象这需要永远,我渴望找到如何更有效地做到这一点。我的第一个想法是利用矩阵是对称的。但我无法弄清楚如何。

我的想法是这样的:

for i in range(0,len(data.columns)):
    for j in range(0,len(data.columns)):
        similarity_data.iloc[i,j] =  pearsonr(data.iloc[:,i],data.iloc[:,j+i])[0]
        similarity_data[j,i] = similarity_data.iloc[i,j]

然而,即使我想让它工作,我担心这里的问题是两个for循环。我试图以某种方式使用地图或lambda方法,但无法到达任何地方。

任何想法如何改善这个(可能很多)?

2 个答案:

答案 0 :(得分:3)

您肯定希望使用scipy.stats.pearsonr,这比from scipy.stats import pearsonr import numpy as np import pandas as pd # make some small data df = pd.DataFrame(np.random.rand(100, 40)) C1 = np.array([[pearsonr(df[i], df[j])[0] for i in df] for j in df]) C2 = np.corrcoef(df.values.T) np.allclose(C1, C2) # True 上的天真循环快约1000倍。例如:

%timeit np.array([[pearsonr(df[i], df[j])[0] for i in df] for j in df])
10 loops, best of 3: 154 ms per loop

%timeit np.corrcoef(df.values.T)
10000 loops, best of 3: 116 µs per loop

以下是时间:

@interface ChartViewController ()

@end

@implementation ChartViewController

-(void)viewDidLoad{
    [super viewDidLoad];

    self.title = @"Chart";


}


- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(nullable UIEvent *)event {
    UITouch *aTouch = [touches anyObject];

    CGRect buttonRect = self.buttonField.frame;


    CGPoint point = [aTouch locationInView:self.buttonField.superview];



    if (!CGRectContainsPoint(buttonRect, point)) {
            _buttonField.layer.borderColor = [UIColor blackColor].CGColor;
            _draggedView.layer.borderColor = [UIColor blackColor].CGColor;
        for (_buttonField in self.view.subviews) {
            _buttonField.layer.borderColor = [UIColor blackColor].CGColor;

        }

    }


}

- (void)longPress:(UILongPressGestureRecognizer*)gesture {
    if ( gesture.state == UIGestureRecognizerStateBegan ) {
        gesture.view.layer.borderColor = [UIColor whiteColor].CGColor;


        UIAlertController * alert=   [UIAlertController
                                      alertControllerWithTitle:@"Would you like to delete the selected rep(s)?"
                                      message:nil
                                      preferredStyle:UIAlertControllerStyleAlert];

        UIAlertAction* deleteButton = [UIAlertAction
                                    actionWithTitle:@"Delete"
                                    style:UIAlertActionStyleDefault
                                    handler:^(UIAlertAction * action)

                                       {
                                           for (_buttonField in self.view.subviews) {
                                               if ([[UIColor colorWithCGColor:_buttonField.layer.borderColor] isEqual:[UIColor whiteColor]]) {

                                                   [_buttonField removeFromSuperview];

                                               }
                                           }

                                        [alert dismissViewControllerAnimated:YES completion:nil];

                                    }];
        UIAlertAction* cancelButton = [UIAlertAction
                                   actionWithTitle:@"Cancel"
                                   style:UIAlertActionStyleDefault
                                   handler:^(UIAlertAction * action)
                                       {






                                       [alert dismissViewControllerAnimated:YES completion:nil];

                                   }];

        [alert addAction:deleteButton];
        [alert addAction:cancelButton];

        [self presentViewController:alert animated:YES completion:nil];
    }
}




- (void)panWasRecognized:(UIPanGestureRecognizer *)panner {
    {

        panner.view.layer.borderColor = [UIColor whiteColor].CGColor;


       _draggedView = panner.view;

        CGPoint offset = [panner translationInView:_draggedView.superview];
        CGPoint center = _draggedView.center;
        _draggedView.center = CGPointMake(center.x + offset.x, center.y + offset.y);


        _draggedView.layer.masksToBounds =YES;
        _buttonField.layer.borderWidth = 3.0f;


        // Reset translation to zero so on the next `panWasRecognized:` message, the
        // translation will just be the additional movement of the touch since now.
        [panner setTranslation:CGPointZero inView:_draggedView.superview];

    }

}

-(void)buttonTouched:(UIButton*)sender forEvent:(id)tap {
    NSSet *touches = [tap allTouches];
    UITouch *touch = [touches anyObject];
    touch.view.layer.borderColor = [UIColor whiteColor
                                    ].CGColor;
}


-(void)doubleTapped:(UIButton*)sender forEvent:(id)twoTaps {
    NSSet *touches = [twoTaps allTouches];
    UITouch *touch = [touches anyObject];
    touch.view.layer.borderColor = [UIColor blackColor].CGColor;

}
- (IBAction)saveButton:(UIBarButtonItem*)saveRep {


    saveCount ++;
    if (saveCount == 1) {

        self.title = @"Chart";


        for (_buttonField in self.view.subviews) {


            _buttonField.userInteractionEnabled = NO;

            _buttonField.layer.borderColor = [UIColor blackColor].CGColor;


            saveCount = 0;

        }
    }
}
- (IBAction)editButton:(UIBarButtonItem*)editRep {


    editCount ++;
    if (editCount == 1) {

        self.title = @"Edit Mode";


        for (_buttonField in self.view.subviews) {

            _buttonField.userInteractionEnabled = YES;


            editCount = 0;
        }

    }
}


- (IBAction)addRepButton:(UIBarButtonItem *)newRep {

    self.labelCounter++;

    buttonCount ++;
    if (buttonCount > 0 )
    {

        _buttonField = [[UIButton alloc]initWithFrame:CGRectMake(300, 300, 28, 28)];
        [_buttonField setTitle:[NSString stringWithFormat:@"%i", self.labelCounter]forState:UIControlStateNormal];
        _buttonField.contentHorizontalAlignment = UIControlContentHorizontalAlignmentCenter;
        _buttonField.userInteractionEnabled = YES;
        _buttonField.layer.cornerRadius = 14;
        _buttonField.layer.borderColor = [UIColor blackColor].CGColor;
        _buttonField.layer.borderWidth = 3.0f;
        _buttonField.titleLabel.font = [UIFont boldSystemFontOfSize:13];
        [_buttonField setTitleColor:[UIColor whiteColor] forState:UIControlStateNormal];
        _buttonField.layer.backgroundColor = [UIColor blackColor].CGColor;
        _buttonField.layer.masksToBounds = YES;


        //Pan gesture declared in button
        UIPanGestureRecognizer *panner = [[UIPanGestureRecognizer alloc] initWithTarget:self action:@selector(panWasRecognized:)];
        [_buttonField addGestureRecognizer:panner];

        //Long Press gesture declared in button
        UILongPressGestureRecognizer *longPress = [[UILongPressGestureRecognizer alloc] initWithTarget:self action:@selector(longPress:)];
        [self.buttonField addGestureRecognizer:longPress];

        //Touch down inside declared in button
        [self.buttonField addTarget:self action:@selector(buttonTouched:forEvent:) forControlEvents:UIControlEventTouchDown];

        //Double Tap inside declared in button
        [self.buttonField addTarget:self action:@selector(doubleTapped:forEvent:) forControlEvents:UIControlEventTouchDownRepeat];


        [self.view addSubview:(_buttonField)];




    }


}

- (void) saveData {
    NSMutableDictionary *dataDict = [[NSMutableDictionary alloc] initWithCapacity:3];
    if (_buttonField != nil) {
        [dataDict setObject:_buttonField forKey:@"placement"];  // save the placement array
    }

    NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
    NSString *documentsDirectoryPath = [paths objectAtIndex:0];
    NSString *filePath = [documentsDirectoryPath stringByAppendingPathComponent:@"Rep"];

    [NSKeyedArchiver archiveRootObject:dataDict toFile:filePath];
}

然而,你的结果将是一个密集的矩阵,有大约1600万个条目,所以它不会是一个快速的计算。您可能会考虑是否确实需要存储所有这些值,或者是否可以使用(例如)计算最近邻居的相关性的算法。

答案 1 :(得分:1)

不会np.corrcoef(data)给你相同的相关矩阵吗?

如果没有,那么当pearsonr()等于i时,你应该能够通过仅计算对称结果矩阵的一半并且根本不调用j来大致加倍性能。