对于类I中的推荐系统项目,我目前正在尝试为具有大约7000个用户(行)和4000个电影(列)的数据集构建和存储基于项目的相似度矩阵。所以我所拥有的是一个数据透视表,其中UserID为索引,MovieID为列,评级为值。你可以想象有很多0评级。
目前我正在使用scipy包中的pearsonr函数。我想,为了存储所有距离,我必须计算所有列之间的皮尔逊系数,并将它们存储在对称的电影 - 电影矩阵中。我的代码到现在为止(你可以看到我是Python /编码的新手):
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
pd.read_csv('data.csv')
data = data.pivot(index = 'UserID', columns = 'MovieID', values = "Rating")
similarity_data = pd.DataFrame(index=data.columns, columns=data.columns)
for i in range(0,len(data.columns)):
for j in range(0,len(data.columns)):
similarity_data.iloc[i,j] = pearsonr(data.iloc[:,i],data.iloc[:,j])[0]
嗯,你可以想象这需要永远,我渴望找到如何更有效地做到这一点。我的第一个想法是利用矩阵是对称的。但我无法弄清楚如何。
我的想法是这样的:
for i in range(0,len(data.columns)):
for j in range(0,len(data.columns)):
similarity_data.iloc[i,j] = pearsonr(data.iloc[:,i],data.iloc[:,j+i])[0]
similarity_data[j,i] = similarity_data.iloc[i,j]
然而,即使我想让它工作,我担心这里的问题是两个for循环。我试图以某种方式使用地图或lambda方法,但无法到达任何地方。
任何想法如何改善这个(可能很多)?
答案 0 :(得分:3)
您肯定希望使用scipy.stats.pearsonr
,这比from scipy.stats import pearsonr
import numpy as np
import pandas as pd
# make some small data
df = pd.DataFrame(np.random.rand(100, 40))
C1 = np.array([[pearsonr(df[i], df[j])[0] for i in df] for j in df])
C2 = np.corrcoef(df.values.T)
np.allclose(C1, C2)
# True
上的天真循环快约1000倍。例如:
%timeit np.array([[pearsonr(df[i], df[j])[0] for i in df] for j in df])
10 loops, best of 3: 154 ms per loop
%timeit np.corrcoef(df.values.T)
10000 loops, best of 3: 116 µs per loop
以下是时间:
@interface ChartViewController ()
@end
@implementation ChartViewController
-(void)viewDidLoad{
[super viewDidLoad];
self.title = @"Chart";
}
- (void)touchesBegan:(NSSet<UITouch *> *)touches withEvent:(nullable UIEvent *)event {
UITouch *aTouch = [touches anyObject];
CGRect buttonRect = self.buttonField.frame;
CGPoint point = [aTouch locationInView:self.buttonField.superview];
if (!CGRectContainsPoint(buttonRect, point)) {
_buttonField.layer.borderColor = [UIColor blackColor].CGColor;
_draggedView.layer.borderColor = [UIColor blackColor].CGColor;
for (_buttonField in self.view.subviews) {
_buttonField.layer.borderColor = [UIColor blackColor].CGColor;
}
}
}
- (void)longPress:(UILongPressGestureRecognizer*)gesture {
if ( gesture.state == UIGestureRecognizerStateBegan ) {
gesture.view.layer.borderColor = [UIColor whiteColor].CGColor;
UIAlertController * alert= [UIAlertController
alertControllerWithTitle:@"Would you like to delete the selected rep(s)?"
message:nil
preferredStyle:UIAlertControllerStyleAlert];
UIAlertAction* deleteButton = [UIAlertAction
actionWithTitle:@"Delete"
style:UIAlertActionStyleDefault
handler:^(UIAlertAction * action)
{
for (_buttonField in self.view.subviews) {
if ([[UIColor colorWithCGColor:_buttonField.layer.borderColor] isEqual:[UIColor whiteColor]]) {
[_buttonField removeFromSuperview];
}
}
[alert dismissViewControllerAnimated:YES completion:nil];
}];
UIAlertAction* cancelButton = [UIAlertAction
actionWithTitle:@"Cancel"
style:UIAlertActionStyleDefault
handler:^(UIAlertAction * action)
{
[alert dismissViewControllerAnimated:YES completion:nil];
}];
[alert addAction:deleteButton];
[alert addAction:cancelButton];
[self presentViewController:alert animated:YES completion:nil];
}
}
- (void)panWasRecognized:(UIPanGestureRecognizer *)panner {
{
panner.view.layer.borderColor = [UIColor whiteColor].CGColor;
_draggedView = panner.view;
CGPoint offset = [panner translationInView:_draggedView.superview];
CGPoint center = _draggedView.center;
_draggedView.center = CGPointMake(center.x + offset.x, center.y + offset.y);
_draggedView.layer.masksToBounds =YES;
_buttonField.layer.borderWidth = 3.0f;
// Reset translation to zero so on the next `panWasRecognized:` message, the
// translation will just be the additional movement of the touch since now.
[panner setTranslation:CGPointZero inView:_draggedView.superview];
}
}
-(void)buttonTouched:(UIButton*)sender forEvent:(id)tap {
NSSet *touches = [tap allTouches];
UITouch *touch = [touches anyObject];
touch.view.layer.borderColor = [UIColor whiteColor
].CGColor;
}
-(void)doubleTapped:(UIButton*)sender forEvent:(id)twoTaps {
NSSet *touches = [twoTaps allTouches];
UITouch *touch = [touches anyObject];
touch.view.layer.borderColor = [UIColor blackColor].CGColor;
}
- (IBAction)saveButton:(UIBarButtonItem*)saveRep {
saveCount ++;
if (saveCount == 1) {
self.title = @"Chart";
for (_buttonField in self.view.subviews) {
_buttonField.userInteractionEnabled = NO;
_buttonField.layer.borderColor = [UIColor blackColor].CGColor;
saveCount = 0;
}
}
}
- (IBAction)editButton:(UIBarButtonItem*)editRep {
editCount ++;
if (editCount == 1) {
self.title = @"Edit Mode";
for (_buttonField in self.view.subviews) {
_buttonField.userInteractionEnabled = YES;
editCount = 0;
}
}
}
- (IBAction)addRepButton:(UIBarButtonItem *)newRep {
self.labelCounter++;
buttonCount ++;
if (buttonCount > 0 )
{
_buttonField = [[UIButton alloc]initWithFrame:CGRectMake(300, 300, 28, 28)];
[_buttonField setTitle:[NSString stringWithFormat:@"%i", self.labelCounter]forState:UIControlStateNormal];
_buttonField.contentHorizontalAlignment = UIControlContentHorizontalAlignmentCenter;
_buttonField.userInteractionEnabled = YES;
_buttonField.layer.cornerRadius = 14;
_buttonField.layer.borderColor = [UIColor blackColor].CGColor;
_buttonField.layer.borderWidth = 3.0f;
_buttonField.titleLabel.font = [UIFont boldSystemFontOfSize:13];
[_buttonField setTitleColor:[UIColor whiteColor] forState:UIControlStateNormal];
_buttonField.layer.backgroundColor = [UIColor blackColor].CGColor;
_buttonField.layer.masksToBounds = YES;
//Pan gesture declared in button
UIPanGestureRecognizer *panner = [[UIPanGestureRecognizer alloc] initWithTarget:self action:@selector(panWasRecognized:)];
[_buttonField addGestureRecognizer:panner];
//Long Press gesture declared in button
UILongPressGestureRecognizer *longPress = [[UILongPressGestureRecognizer alloc] initWithTarget:self action:@selector(longPress:)];
[self.buttonField addGestureRecognizer:longPress];
//Touch down inside declared in button
[self.buttonField addTarget:self action:@selector(buttonTouched:forEvent:) forControlEvents:UIControlEventTouchDown];
//Double Tap inside declared in button
[self.buttonField addTarget:self action:@selector(doubleTapped:forEvent:) forControlEvents:UIControlEventTouchDownRepeat];
[self.view addSubview:(_buttonField)];
}
}
- (void) saveData {
NSMutableDictionary *dataDict = [[NSMutableDictionary alloc] initWithCapacity:3];
if (_buttonField != nil) {
[dataDict setObject:_buttonField forKey:@"placement"]; // save the placement array
}
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectoryPath = [paths objectAtIndex:0];
NSString *filePath = [documentsDirectoryPath stringByAppendingPathComponent:@"Rep"];
[NSKeyedArchiver archiveRootObject:dataDict toFile:filePath];
}
然而,你的结果将是一个密集的矩阵,有大约1600万个条目,所以它不会是一个快速的计算。您可能会考虑是否确实需要存储所有这些值,或者是否可以使用(例如)计算最近邻居的相关性的算法。
答案 1 :(得分:1)
不会np.corrcoef(data)
给你相同的相关矩阵吗?
如果没有,那么当pearsonr()
等于i
时,你应该能够通过仅计算对称结果矩阵的一半并且根本不调用j
来大致加倍性能。