从pandas数据帧转换后修改numpy数组

时间:2017-08-30 19:45:12

标签: python matlab pandas numpy

我有以下代码,我在python中编写一个简单的电影推荐人的一部分,所以我可以模仿我在Andrewra教授的课程机器学习课程中获得的结果。

我想修改在pandas数据帧上调用as_matrix()后得到的numpy.ndarray,并像在MATLAB中一样添加列向量

Y = [ratings Y]

以下是我的python代码

dataFile='/filepath/'

userItemRatings = pd.read_csv(dataFile, sep="\t", names=['userId', 'movieId', 'rating','timestamp'])
movieInfoFile = '/filepath/'
movieInfo = pd.read_csv(movieInfoFile, sep="|", names=['movieId','Title','Release Date','Video Release Date','IMDb URL','Unknown','Action','Adventure','Animation','Childrens','Comedy','Crime','Documentary','Drama','Fantasy','Film-Noir','Horror','Musical','Mystery','Romance','Sci-Fi','Thriller','War','Western'], encoding = "ISO-8859-1")

userMovieMatrix=pd.merge(userItemRatings, movieInfo, left_on='movieId', right_on='movieId')
userMovieSubMatrix = userMovieMatrix[['userId', 'movieId', 'rating','timestamp','Title']]


Y = pd.pivot_table(userMovieSubMatrix, values='rating', index=['movieId'], columns=['userId'])
Y.fillna(0,inplace=True)
movies = Y.shape[0]  
users = Y.shape[1] +1 



ratings = np.zeros((1682, 1))

ratings[0] = 4  
ratings[6] = 3  
ratings[11] = 5  
ratings[53] = 4  
ratings[63] = 5  
ratings[65] = 3  
ratings[68] = 5  
ratings[97] = 2  
ratings[182] = 4  
ratings[225] = 5  
ratings[354] = 5

features = 10

theta = pd.DataFrame(np.random.rand(users,features))# users 943*3 
X = pd.DataFrame(np.random.rand(movies,features))# movies 1682 * 3


X = X.as_matrix()
theta = theta.as_matrix()

Y = Y.as_matrix()


"""want to insert a column vector into this Y to get a new Y of dimension 
   1682*944, but only seeing 1682*943 after the following statement

"""
np.insert(Y, 0, ratings, axis=1)

R = Y.copy()
R[R!=0] = 1





Ymean = np.zeros((movies, 1))  
Ynorm = np.zeros((movies, users))



for i in range(movies):  
    idx = np.where(R[i,:] == 1)[0]
    Ymean[i] = Y[i,idx].mean()
    Ynorm[i,idx] = Y[i,idx] - Ymean[i]

print(type(Ymean), type(Ynorm), type(Y), Y.shape)
Ynorm[np.isnan(Ynorm)] = 0.
Ymean[np.isnan(Ymean)] = 0.

插入了一个内联注释,但我的问题是当我创建一个新的numpy数组并调用insert时,它的工作正常。但是,在调用as_matrix()的pandas数据帧上调用pivot_table()后得到的numpy数组不起作用。还有其他选择吗?

1 个答案:

答案 0 :(得分:1)

package com.example.android.cloudmusic; import android.support.v7.app.AppCompatActivity; import android.os.Bundle; import android.util.DisplayMetrics; public class MainActivity extends AppCompatActivity { @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); DisplayMetrics metrics = new DisplayMetrics(); getWindowManager().getDefaultDisplay().getMetrics(metrics); int height = metrics.heightPixels; int width = metrics.widthPixels; // use these height and width here onwards.. } } does not operate in place, you need to assign the output to a variable. Try:

library(lubridate)
library(dplyr)
library(tidyr)

periodicassets <- periodicassets %>%
        mutate(Date = ymd(paste(Period, "01", sep = ""))) %>%
        select(-Period)


dailycds$Date <- dmy(dailycds$Date)

full_join(dailycds, periodicassets) %>% 
        arrange(Date) %>% fill(Assets, .direction = "down") %>%
        na.omit