ELSEVIER

ICCS 2016. The International Conference on Computational

Science

Recommendation System Based on Complete

Personalization

Kourosh Modarresi

Adobe Inc., San Jose, U.S.

kouroshm@alumni.stanford.edu

Abstract

Current recommender systems are very inefficient. There are many metrics that are used to measure the effectiveness of recommender systems. These metrics often include "conversion rate" and "click through rate". Recently, these rates are in low single digit (less than 10%). In other words, for more than 90% of times, the model that the targeting system is based on, produces noise. The belief in this work is that the main problem leading to getting such unsatisfactory outcomes is the modeling problem. Much of the modeling problem could be represented and exemplified in treating users and items as member of clusters(segments). In this work, we consider full personalization of recommendation systems. We aim at personalization of users and contents simultaneously. Recommendations using baseline approach are inaccurate and targeting based on similarity-based recommendation (collaborative filtering) suffer from many disadvantages such as the neglect of interactive correlation. In this work, similarity based targeting has been combined with baseline approach and latent factor models and has been treated with adaptive regularization allowing complete personalization with respect to both users and items.

Keywords: Similarity Transformation, Collaborative Filtering, Singular Value Decomposition, Localized Regularization, Latent Factor, Baseline Model

There two major approaches in providing desirable contents to users. The first tool used for offering these contents to maximize some metrics (such as user conversion, user's delight, ...) is

1 Introduction

search. In search, a user initiates the process and indicates their interests through "key search" of words/phrases.

The main task in search is to provide the best possible contents that best match the user's query. Though the presence of users on the digital space is not limited to search queries and has many more diverse forms for which the users do not "explicitly" indicate their interests and their desired/favorite contents. Recommendation systems address these types of user digital experiences and aim at recommending the users contents that users may be interested in and desire. Recommender systems find these desired contents by looking at preferences the users have displayed and expressed implicitly. The recommendation (offering) process could be initiated directly (mail, email, text messages, ...) or indirectly (ads, directing users to new sites, and so on).

During the last two decades, search has been often the major methods of matching users 'with their desired items in digital domain. Practically, users may search for many items on line that may not be correlated to one another as users may move from searching for one category of items to some other very different ones. This discontinuity - in the search categories - is often not accounted for in the search algorithms. Similarly, while browsing, users may change the class of items (contents) they have been viewing online. The result of ignoring the "class/category jump" of a user's online trajectory (for both view and search) is that the users are targeted based on the category/class of their past trajectories and not their most recent ones. One of the consequence of this incorrect assumption is that the users may be recommended items they are no longer are looking for.

As another challenge, the outcome of a search query is often too generalized to be helpful in making an accurate match to a user's demand and this generalized aspect of the outcome of a search process has been a main source of the insufficiencies search engines have been experiencing. In the most general form, recommender systems can be similar to search queries with the distinction that there is no query in the recommendation system's offerings since users do not indicate any explicit or direct query. Instead of searching for an item (content) as is the case in search, recommender systems find out the users' desired contents in an indirect fashion. This unexpected offering is based on users past interactions with the contents and also on the features of users and contents.

Examples of the applications of recommender systems - with some samples of companies that are using them - include[2] movie recommendation ( Netflix and Amazon), related product recommendation (Adobe, Amazon), web page ranking (Google, Yahoo), social recommendation (Facebook, Google), news content recommendation (Yahoo, Google), priority inbox & spam filtering (Google, Yahoo), online dating (OK Cupid, match.com, Yahoo), computational Advertising (Yahoo, Facebook), online course offering (Coursera, Udacity). Recommendation systems represent a considerable value for many businesses. For example, [2] for Netflix, about 80% of the contents that are used are recommended contents, for Amazon, 30-40% of the items sold are recommended contents, and for Google, recommendation generates 38% more click-through. There are many challenges in designing and implementing a good recommender system with some of them to be the issues such as proper metrics to measure effectiveness, privacy of users and scaling. Though, the major difficulty is in the modeling approach in the sense of the need for an accurate, stable, and efficient models. Users' targeting based on recommender systems could be modified to take into account the continuity of the user view trajectory.

High dimensionality is a major concern for recommender systems. It arises from the fact that there are practically infinite number of contents (choices) and also potentially infinite dimensions for users (accounting for the fact that even a single user has many representations- as a function of place he may go, the time he may be next day, the job he may have next, and so on). Thus the matching problem of

recommending the best content to users is NP hard problem. Inevitably, the matching has to take place in a lower dimensional space with its dimensions to be only a fraction of the ones of the original space.

2 The Hybrid Model for Recommender System

Data: The data is represented in the matrix form, called matrix X. Rows of the matrix are users (i) and its columns are contents or items (j). Each matrix entry - Хц - displays the rating of user i on item j. Thus each entry (хц) is the relation between a specific content j and a specific user i. In general, the entries may be explicit such as opinions or the actions, rating or total purchases, or may be implicit data such as the amount of time spent on a web site, how much time you spent searching an item/movie (vs saying I like it or rate the website on a 1-5 basis). Explicit data is not always available and often not enough of that could be found [16]. Historic (logged) data or live data (streaming) data and quite often a combination of both are used to design and test a recommendation model

Similarity Based Models:

Given the massive amount of possible contents (movies, articles, web sites, items, ...) that are available, one could see the need of narrowing down the number of possible choices based on the likelihood a user has any need or desire toward those contents. One approach in achieving this goal of narrowing down the options is the application of similarity based models

The idea of similarity based recommendation is based on the computation of an unknown user rating or propensity (unknown entry in the matrix X) using some of the known entries or rating in the matrix. There two types of similarity based recommender systems, content -to-contents and user-to-user similarity based recommendations.

Explicit Content and User Based Similarity Approaches:

In the content-based similarity model, one can find the rating of the user on an unrated content (x^) by looking at the similar items (similar to the content j) that have been rated by the same user, user i. These similarities are computed using explicit features of the contents. Then using a weighted average of all similar contents, the unknown rating, x^-, is computed. In the user (demographic) based similarity recommendation, the model uses the available rating of all other similar users (similar to user i) on the same content j. The final rating of the user i on content j, Xij, is computed as a weighted average of the ratings of all similar users on the same content j. These weights correspond to the degree of similarities so higher similarities would lead to higher weights.

In the explicit similarity based recommendation models, the similarities are computed using explicit features of contents or users. For example, in the case of user-movie recommendation problem, the contents (movie) features could include features of movies such as their director, length,

actors and actresses, and studio. The explicit features for the user could include user's age, user's income, user's address and user's marital status.

Collecting explicit user and contents features is nontrivial and expensive [2]. In addition, the analysis based on the explicit features is biased and not useful since the explicit features are noisy, sparse and highly correlated. This will take us to believe that - instead of the available explicit features - we need to use other new features that produce accurate results. These new features are not observed because they may not be observable or due to the high cost of their observation and measurement. These new implicit features can explain the available user experience data (such as movie rating, purchasing patterns, so on) very effectively and definitely more effectively than explicit features could, in part due to the fact that they could explain also complicated and unknown data characteristics that are difficult to be measured or observed explicitly.

The other disadvantages of the explicit similarity based recommendation models is that they ignore the interaction among features and consider the similarity by (implicitly) assuming the two contents (users) are uncorrelated with the rest of the contents (users) [65].

Implicit Content and User Based Similarity Approaches:

In the implicit similarity approach for recommendations, explicit profiles (features) of users and contents are not required and are indeed computed implicitly (indirectly) to be used for the computation of similarities between users and between contents.

In these methods (sometime called neighborhood based methods [65]) to find a rating for an item the user has not seen, once more, there are two plausible paths, user based and content based ones. In the content based approach, we look at all other contents the (same) user has rated and the use a weighted aggregation (weighted averaging) of all these rating to compute the rating for the unseen content. The weights are computed based on the similarities between these contents - and the contents with unknown rating - with higher weights to be used for contents with higher similarity. These similarity of contents are computed using the content vectors. Each content vector's entries are all the rates different users have given to that content. Alternatively, in the user based approach, the similarities of other users who have rated the content - that user has not seen - with the user and then using a weighted average of all of these other users' ratings to compute the unknown rating of the user on the contents user has not observed/rated.

In the latent factor model (section 2.3) both of these approaches are used simultaneously.

The implicit models compute and use the content and user features implicitly so we do not need to collect the explicit features. The past activities/interactions are used to deduce similarities among users (user-to-user) and contents (item-to-item similarities). Hence, the similarities are based on the actions (rating, like, how many visits user had on a site, ...) and not based on explicit user features (age, income) or content features (book's author, price, content, title, .). In other words, to compute the similarity of two users, we can compare their actions (rating, for example) on the same content (item). If the rankings (on all contents) are similar, then the users are similar and the other way around.

Singular value decomposition (SVD) is one major tools in the development of implicit based similarity recommender systems. Instead of having access and using the explicit contents' and users' features, SVD uses the rating of all users on the items they have rated, to implicitly discover

(compute) the item and user features. Though, these discovered features are not the original explicit features (age, income, address, ... for users and director, length, studio, ... for the contents - in the above example). The implicit features discovered are - in general - a non-linear/linear combination of the original features and may not represent any physical or explicit interpretation.

2.1 Singular Value Decomposition (SVD)

Singular value decomposition is defined for every m*n matrix X as: X = UDVt

Where: U, the left singular vectors, is m*n orthogonal matrix,

UUl = UlU = I V, the right singular vectors, is n*n orthogonal matrix

wl = VlV = I and D = diag (dt, d2,..., dn) with the singular vectors; dt> d2 >■■■.> dn> 0

SVD could be computed using minimum reconstruction error,

min \\X-UqDqVq\

which is equivalent to

argmin||X — Xvuc^2

(u,v,d)

min > \xn — UjdjVi712

u, D,V L-l J ' ' J

Though the matrix X has many missing entries that ca not be part of this computation. Using only the available entries of X,

argmin'Lij£N(ij)\xij - uidiVjT}2

U, D,V

Where N(i, j) is the set of all i and j where the corresponding entries in X are not missing.

2.2 The Collaborative Filtering (CF)

The implicit similarity based recommender systems are often called collaborative filtering (CF). CF methods do not need domain specific knowledge or features of contents nor users. They could use the interaction/feedback of users on a contents to compute the unknown interactions (rates). To make results of CF to be accurate, large amount of data is needed.

One simple model to compute the unknown entries in the matrix X is to use the baseline model. The basic idea is that an unknown rating of user I on the content j, x^, could be computed based on the average rating the user i gives to all contents rated by the user, the average rating the content j has received and the average of all rating (by all users on all contents). Baseline estimate for xy is;

blj- = v + tj + tj

Where,

v = mean rating of all users over all items

ti = rating bias of user i = mean of all ratings by user i - v

tj = rating bias of item j = mean of all ratings on item j - v

Obviously, this is the most basic model and the unknown rating ( x^) needs to be computed using a weighted mean of the baseline scores.

xij = xijes = bij + Xyg^jj) a-ij * ( xLj —bij)

Where,

S(i, j) include all items rated by user i that are similar to item i in the sense of some similarity measure such as KNN or a threshold of correlation between those items and item i.

a^ is estimated by minimizing the Root Mean Square error (RMSE),

RMSE = i (£tJ(xt} -x^yf

Where M is the number of available (non zero) enrtries in matrix X. Minimization of RMSE leads to,

min1 (y (Xii ~(biJ + y aij * (Xii ~bij)y) /

an M \Z-iij£N(ij) J J Z_ijes(ij) J J J

Since M is a constant, this is equivalent to

min (y (Xij - ( bij + y a0- * ( Xij - bij))) 2)

ay \t-HjeN(l,j) J J ¿->jes(i,j) J J u )

To simplify this equation more, we use MSE instead of RSME

Z\xi} - ( bij + y Uij * ( Xij - bij))]2

ij£ N(iJ) ¿—'j£ S(iJ)

'je s(l,j)

Substituting for bij ,

minT,ijeN(ij)\Xij - (V + ti + tj + Zjes(i,j) "ij * ( Xij - bij))]2

Though, we need to apply regularization [102, 103, 105]to prevent overfitting. Thus,

min{ hje N(i,j)\xij - (P + ti + tj + Ijes(ij) aij * ( xij ~bij))]2 aii

+ Y (Sie N(i,j) ti2 + Eye N(i,j) tj2 )

For y to be the regularization parameter.

But this is a global regularization that ignores specific charatersitics of users and items and to treat both items and users as they have same features. To consider regularization of items and users separately,

This way, we recognize items and features are two different variables that needed to be penalized or regularized separately considering their own features.

Though, still we consider all users the same as one another by penalizing all the same. That is also true for the items in the sense that they are all treated as they exhibit the same features. To achieve full personalization of considering each user as a unique individual and also each item as a unique item, we use adaptive personalization [ 72, 76, 77] ;

Latent Factor [70, 78] model is a generalization and extension of CF. The goal is to find the underlying (latent) factors that can explain all the interactions (rating, like/not like, purchasing, ...). But since these underlying factors are often have not been observed or are not observable (not measurable), they are called latent factors. Singular value decomposition is the major technic used in the latent factor model.

SVD matrix factorization models rely on the correlations among users (user habits such as rating, ranking, liking, purchasing, ...) and also correlation among contents (similar patterns such as movie genre, movie's length, director, ...). Though, unlike the explicit similarity-based models, it does need

min{hjeN(l,j)\Xij-V + ti + tj + £/es(ij) <*ij * ( Xij — bij))]2 aij

+ (ZieN(ij) Yiti + T.jeN(i,j) Yztj )

min{hjeN(l,j)\Xij-V + ti + tj + Zjes(ij) «ij * ( Xij — bij))]2 aij

+ (Li£N(i,j) Yiti + T.jeN(i,j) Yjtj )

2.3 The Model for the Latent Factor

any explicit user or content features and discovers the underpinning variables (latent factors) that explain those similarities.

The latent factor model is ill-posed (ill-conditioned) and thus there is not enough information to solve it. From the point of view of linear systems of equations, we have - in effect - an underdetermined system where the number of constraints (equations or rows in the matrix representation of data) is less than the degree of freedom of the system (variables or columns) and therefore we have infinitude of solutions. To make the problem solvable, we have to add more constraints in the form of regularization to penalize overfitting.

To compute the unknown data entries in the data matrix x, we use an inverse version of SVD

Factorization [78]. This way, using the known (non-missing) entries, we can find the right hand side of the SVD decomposition, i.e., the singular vectors and singular values. Then, using the right hand side, we can compute the missing entries by reconstructing the original matrix. By renaming the right hand side as,

R= U and QT = D VT And using the component of the matrices;

xij = n * RjT = Xp=irip * Rpj We compute the best reconstruction matrix for the matrix X

mm Y^Ji^Ntijpij - rt *q/ ? But, we have missing entries and thus to prevent overfitting, we have to add regularization;

minZ£f(xy - n * q/ ) + A( Z?=1 lh||2 + Z?=i \\qj\\2)

This approach of regularization suggests that the users and contents have the same characteristics and thus should be penalized similarly. To treat the users and contents distinctively,

mm- rt * q? )2 + ( ^ ^Mf + ZU Ä2\\qJ\\2)

Though, this assumes that all users have the same features and thus the same regularization should be applied to all of them. To personalize the recommender system, we use a localized regularization [78]

min- rt * qjT )2 + ( Z?=1 ¿¿INI2 + Z?=i A2 \\qj\\2

To extend the localized regularization to the contents also, we get [70, 71,72]:

minZujeNwMii - n * uT )2 + ( 2?=! *iM\2+zu ¿/Ikll2)

2.4 The Final Hybrid Recommender System Model

The assumption on this work is that the missing data is missing completely at random (MCAR) [100]. This means that the probability that a data point is missing does not depend on its observed value. We also use the concentration of measure assumption [16] that means the information in the data is concentrated (lies) in a lower dimensional space or the rank of the data matrix is k which is very small compared to min (m, n). This leads to that SVD is the solution for the low rank approximation problem. The result is referred to as the matrix approximation lemma or Eckart-Young-Mirsky [36].

Also, with respect to the sparseness of the data matrix, we assume that;

a > C n1-2 r logn

Where r= rank(X),

a= number of available entries in X,

for some positive numerical constant C. under these circumstances, we could accurately recover the missing entries in the data matrix [105].

In this work, we may not know the specific (explicit) features of the contents nor those of the user. T

The model is domain-neutral and the rating could be for movies, books or websites or any other content. In computing or recovering the unknown entries of the matrix, overfitting may happen which is due to the lack of sufficient information and thus some penalization of the objective function in the form of regularization becomes necessary. This model is based on a different view of regularization, i.e., a localized regularization technique which leads to improvement in the estimation of the missing values.

Often, latent factor models work better in generalizing the complete structure of the data while similarity based methods do better job when the data is dominated by a small group of highly correlated data points [1-3]. The mix model in this work is based on the understanding that no single models could work equally effective on all data and applications. Thus, we combine the baseline and similarity based model in section 2.2 with the latent factor model in section 2.3 and then apply localized regularization model to achieve fully personalized recommendations.

Thus, using the similarity based CF in section 2.2 and and Latent factor approach of 2.3,

xi} = bi} + ri * qjT Combining with the baseline model in 2.2,

Xij = v + tj + tj +rt * qjT

We get the final hybrid model of,

Dmin {lijeN(ij)[Xij - ( bij +ri *QjT+ Ijes(ij) aij * ( Xij -bij))]2 + (LizN(ij) Yi^2 +

K>QAij,

Yjtj2) + ( Z?=i AfNTiN2 + I?=i ¿/lkl|2)

The implementation algorithm is based on stochastic gradient descent (SGD) where we fix all of the variables except one variable that is to be optimized. This process is iteratively done for each variable till all variables converge.

The model was applied on two data sets of 10000x20 dimensions. The first data matrix contained the conversion of different ad campaigns and the second data set contained movie rating (1-5 ratings). We compared the results of the application of our model with those of similarity based (section 2.2) and latent factor base (section 2.3). The average improvement in the accuracy (RSME) of the recommendations was 16.4% and 12.8%, correspondingly.

References

[1] G. Adomavicius and A. Tuzhilin, "Towards the next generation of recommender systems: a survey of the state-of-the-art and possible extensions," IEEE Trans. on Data and Knowledge Engineering 17:6, pp. 734-749, 2005.

[2] Xavier Amatriain, "Recommender Systems", MLS 2014.

[3] C. Anderson, "The Long Tail: Why the Future of Business is Selling Less of More," Hyperion Books, New York, 2006.

[4] L. Backstrom, J. Leskovec, "Supervised Random Walks: Predicting and Recommending Links in Social Networks," ACM International Conference on Web Search and Data Mining (WSDM), 2011.

[5] J. Baumeister, "Stable Solution of Inverse Problems", Vieweg, Braunschweig, Germany, 1987.

[6] S. Becker, J. Bobin, and E. J. Candes. NESTA," a fast and accurate first-order method for sparse recovery," SIAM J. on Imaging Sciences 4(1), 1-39, 2009.

[7] A. Bjorck, "Numerical Methods for Least Squares Problems" ,SIAM, Philadelphia,1996.

[8] S. Boyd and L. Vandenberghe, "Convex Optimization", Cambridge University Press, 2004.

[9] J.S. Breese, D. Heckerman, and C. Kadie, "Emperical analysis of predictive algorithms for collaborative filtering," Proceedings of Fourteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 1998.

[10] P. A. Businger, G. H. Golub, "Singular value decomposition of a complex Matrix", Algorithm 358, Comm. Acm, No. 12, pp. 564-565, 1969.

[11] J. Cadima and I. T. Jolliffe, " Loadings and correlations in the interpretation of principal components", Journal of Applied Statistics, 22:203-214, 1995.

[12] J-F Cai, E. J. Candes and Z. Shen, "A singular value thresholding algorithm for matrix

completion," SIAM J. on Optimization 20(4), 1956-1982, 2008.

[13] E. J. Candès and Y. Plan, "Matrix completion with noise," Proceedings of the IEEE 98(6), 925-936, 2009.

[14] E. J. Candès and Y. Plan, "Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements," IEEE Transactions on Information Theory 57(4), 23422359, 2009.

[15] E. J. Candes and T. Tao, "Decoding by linear programming", IEEE Transactions on Information Theory, 51(12):4203-4215, 2005.

[16] E. J. Candès and B. Recht, "Exact matrix completion via convex optimization," Found. of Comput. Math., 9 717-772, 2008.

[17] E. J. Candès, "Compressive sampling," Proceedings of the International Congress of Mathematicians, Madrid, Spain, 2006.

[18] E. J. Candès and T. Tao, "Near-optimal signal recovery from random projections: universal encoding strategies," IEEE Trans. Inform. Theory, 52 5406-5425, 2004.

[19] P.-Y. Chen, S.-Y. Wu, J. Yoon, "The Impact of Online Recommendations and Consumer Feedback on Sales, " Proceedings of the 25th International Conference on Information Systems, 711-724, 2004.

[20] Y H Cho, Jae Kyeong Kim and Soung Hie Kim, "A personalized recommender system based on web usage mining and decision tree Induction," Expert System Applications, Vol.23, pp.329-342, 2002.

[21] Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., and Sartin M., "Combining content-based and collaborative filters in an online newspaper," Proceedings of the ACM SIGIR'99 Workshop on Recommender Systems, 1999.

[22] R. Courant and D. Hilbert, "Methods of Mathematical Physics", Vol. II, Interscience, New York, 1953.

[23] A. d'Aspremont, L. El Ghaoui, M.I. Jordan, and G. R. G. Lanckriet, "A direct formulation for sparse PCA using semidefinite programming", SIAM Review, 49(3):434-448, 2007.

[24] A. R. Davies and M. F. Hassan, "Optimality in the regularization of ill-posed inverse problems", in P. C. Sabatier (Ed.), Inverse Problems: An interdisciplinary study, Academic Press, London, UK, 1987.

[25] B. DeMoor, G. H. Golub, "The restricted singular value decomposition: properties and applications", SIAM J. Matrix Anal. Appl., 12, No. 3, pp. 401-425, 1991.

[26] D. L. Donoho and J. Tanner, " Sparse nonnegative solutions of underde- termined linear equations by linear programming", Proc. of the National Academy of Sciences, 102(27):9446-9451, 2005.

[27] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R., "Least Angle Regression," The Annals of Statistics, 32, 407-499, 2004.

[28] Lars Elden, "Algorithms for the Regularization of Ill-Conditioned Least Squares Problems", BIT 17, pp. 134-145, 1977.

[29] Lars Elden, "A Note on the Computation of the Generalized Cross-Validation Function for Ill-Conditioned Least Squares Problems", BIT 24, pp. 467-472, 1984.

[30] Heinz. W. Engl, M. Hanke, and A. Neubauer, "Regularization methods for the stable solution of inverse problems" , Surv. Math. Ind., No. 3, pp. 71-143, 1993.

[31] H. W. Engl, M. Hanke, and A. Neubauer, "Regularization of Inverse Problems", Kluwer, Dordrecht, 1996.

[32] H. W. Engl, K. Kunisch, and A. Neubauer, "Convergence rates for Tikhonov regulari- sation of non-linear ill-posed problems" , Inverse Problems, (5), pp. 523-540, 1998.

[33] H. W. Engl , C. W. Groetsch (Eds), "Inverse and Ill-Posed Problems", Academic Press, London, 1987.

[34] M. Fazel, H. Hindi, and S. Boyd. "A rank minimization heuristic with application to

minimum order system approximation", Proceedings American Control Conference, 6:4734 4739, 2001.

[35] W. Gander, "On the linear least squares problem with a quadratic Constraint", Technical report STAN-CS-78-697, Stanford University, 1978.

[36] G. H. Golub, C. F. Van Loan, "Matrix Computations", 4th Ed., Computer Assisted Mechanics and Engineering Sciences, Johns Hopkins University Press, US, 2013.

[37] Gene H. Golub, Charles F. Van Loan, "An Analysis of the Total Least Squares Problem", Siam J. Numer. Anal., No. 17, pp. 883-893, 1980.

[38] Gene H. Golub, W. Kahan, "Calculating the Singular Values and Pseudo-Inverse of a Matrix", SIAM J. Numer. Anal. Ser. B 2, pp. 205-224, 1965.

[39] Gene H. Golub, Michael Heath, Grace Wahba, "Generalized Cross-Validation as a Method for Choosing a Good Ridge Parameter", Technometrics 21, pp. 215-223, 1979.

[40] S. Guo, M. Wang, J. Leskovec, "The Role of Social Networks in Online Shopping: Information Passing, Price of Trust, and Consumer Choice," ACM Conference on Electronic Commerce (EC), 2011.

[41] H'aubl, G., Trifts, V., "Consumer decision making in online shopping environments: The effects of interactive decision aids," Marketing Science, 4-21 , 2000.

[42] Hastie, T., Tibshirani, R., and Friedman, J. ," The Elements of Statistical Learning; Data mining, Inference and Prediction", New York: Springer Verlag, 2001.

[43] Hastie, T.J and Tibshirani, R. "Handwritten Digit Recognition via Deformable Prototypes", AT&T Bell Laboratories Technical Report, 1994.

[44] Hastie, T., Tibshirani, R., Eisen, M., Brown, P., Ross, D., Scherf, U., Weinstein, J., Alizadeh, A., Staudt, L., and Botstein, D., " 'Gene Shaving' as a Method for Identifying Distinct Sets of Genes With Similar Expression Patterns," Genome Biology, 1, 1-21, 2000.

[45] David Heckerman, David Maxwell Chickering, Christopher Meek, Robert Rounthwaite, and Carl Kadie, "Dependency networks for inference, collaborative filtering, and data visualization," Journal of Machine Learning Research, 1:49-75, 2000.

[46] T. Hein, "Some analysis of Tikhonov regularization for the inverse problem of option pricing in the price-dependent case," ,SIAM Review, (21)No. 1, pp. 100-111, 1979.

[47] T. Hein and B. Hofmann, "On the nature of ill-posedness of an inverse problem in option pricing," ,Inverse Problems,(19), pp. 1319-11338, 2003.

[48] Herlocker, J.L., and Konstan, J.A. "content-Independent Task-Focused Recommendation," IEEE Internet Computing, Vol. 5,pp. 40-47, 2001.

[49] Herlocker, J., Konstan, J., Riedl, J., "Explaining collaborative filtering recommendations," Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, pp. 241-250. ACM, 2000.

[50] B. Hofmann, "Regularization for Applied Inverse and Ill-Posed problems ," Teubner, Stuttgart, Germany, 1986.

[51] B. Hofmann, "Regularization of nonlinear problems and the degree of illposedness," in G. Anger, R. Goreno, H. Jochmann, H. Moritz, and W. Webers (Eds.), inverse Problems: principles and Applications in Geophysics,Technology, and Medicine, Akademic Verlag, Berlin, 1993.

[52] Roger A. Horn; Charles R. Johnson, "Matrix Analysis", Second Edition, Cambridge university Press, 2012

[53] T. A. Hua and R. F. Gunst, "Generalized ridge regression: A note on negative ridge

parameters," Comm. Statist. Theory Methods, 12, pp. 37-45, 1983.

[54] V.S. Iyengar and T. Zhang, "Empirical study of recommender systems using linear classifiers," Proceedings of the Fifth Pacific-Asia Conference on Knowledge Discovery and

Data Mining, pages 1б-27, 2001.

[55] V. K. Ivankov, "On linear problems which are not well-posed ," Dokl. Akad. Nauk SSSR, 145, pp. 270-272, 19б2.

[56] Jeffers, J., "Two Case Studies in the Application of Principal Component," Applied Statistics, 1б, 225-23б, 19б7.

[57] Jolliffe, I. , Principal Component Analysis, New York: Springer Verlag, 19S6.

[5S] I. T. Jolliffe, "Rotation of principal components: choice of normalization Constraints," Journal of Applied Statistics, 22:29-35, 1995.

[59] I. T. Jolliffe, N.T. Trendafilov, and M. Uddin, "A modified principal component technique based on the LASSO," Journal of Computational and Graphical Statistics, 12:531-547, 2003.

[60] M. Journ'ee, Y. Nesterov, P. Richt'arik, and R. Sepulchre, "Generalized power method for sparse principal component analysis," arXiv:0S11.4724, 200S.

[61] Kim, D., Ferrin, D., Rao, H., "A trust-based consumer decision-making model in electronic commerce: The role of trust, perceived risk, and their antecedents," Decision Support Systems 44(2), 544-5б4, 200S.

[62] Jae Kyeong Kim, Yoon Ho Cho, Woo Ju Kim, Je Ran Kim and Ji Hae Suh, "A personalized recommendation procedure for Internet Shopping," Electronic Commerce Research and Applications, Vol.1, pp.301-313, 2002.

[63] Jae Kyeong Kim, Hyea Kyeong Kim and Hee Young Oh and Young U. Ryu. "A group recommendation system for online communities", International Journal of Information Management, Vol. 30, pp.212-219, 2010.

[64] Y. Koren, "The BellKor Solution to the Netflix Grand Prize, Report from the Netfl ix Prize Winners," 2009.

[65] Yehuda Koren, "Factorization Meets the Neighborhood: a Multifaceted Collaborative filtering Model",

[66] Misha E. Kilmer and Dianne P. OLeary, "Choosing regularization parameters in iterative methods for ill-posed problems," SIAM J. MATRIX ANAL. APPL., Vol. 22, No. 4, pp. 1204-1221. 2001.

[67] Andreas kirsch, "An Introduction to the Mathematical theory of Inverse problems," Springer Verlag, New York, 199б.

[6s] Mardia, K., Kent, J., and Bibby, J., "Multivariate Analysis," New York: Academic Press, 1979.

[70] J. Leskovec, A. Rajaraman and J. Ullman, Mining Massive Datasets, Palo Alto, CA March, 2014.

[71] G. Linden, B. Smith, and J. York, "Amazon.com recommendations: item-to-item collaborative filtering," Internet Computing 7:1, pp. 76-80, 2003.

[72] Linyuan Lu, Matus Medob, Chi Ho Yeung, Yi-Cheng Zhang, Zi-Ke Zhanga, Tao Zhou, "Recommender Systems", Feb. 7, 2012.

[73] Maida, M., Maier, K., Obwegeser, N., Stix, V., "Explaining mcdm acceptance: a conceptual

model of influencing factors," Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 297-303. IEEE, 2011.

[74] Rahul Mazumder, Trevor Hastie and Rob Tibshirani, "Spectral Regularization Algorithms for Learning Large Incomplete Matrices," JMLR 2010 11 2287-2322, 2010.

[75] McCabe, G., "Principal Variables," Technometrics, 26, 137-144, 1984.

[76] Kourosh Modarresi and Gene H Golub, "An Adaptive Solution of Linear Inverse Problems", Proceedings of Inverse Problems Design and Optimization Symposium (IPDO2007), April 16-18, Miami Beach, Florida, pp. 333-340, 2007.

[77] Kourosh Modarresi, "A Local Regularization Method Using Multiple Regularization Levels", Stanford, CA, April 2007.

[78] Kourosh Modarresi, "Computation of Recommender System Using Localized Regularization",

Procedia Computer Science, Volume 51, 2015, Pages 2407-2416.

[79] Kourosh Modarresi and Gene H Golub, "An Efficient Algorithm for the Determination of Multiple Regularization Parameters," Proceedings of Inverse Problems Design and Optimization Symposium (IPDO), April 16-18, 2007, Miami Beach, Florida, pp. 395-402, 2007.

[80] D. W. Marquardt, "Generalized inverses, ridge regression, biased linear estimation," and nonlinear estimation, Technometrics, 12, pp. 591-612, 1970.

[81] K. Miller, "Least Squares Methods for Ill-Posed Problems with a prescribed bond," SIAM J. Math. Anal., No. 1, pp. 52-74, 1970.

[82] B. Moghaddam, Y. Weiss, and S. Avidan, "Spectral bounds for sparse PCA: exact and greedy algorithms," Advances in Neural Information Processing Systems, 18, 2006.

[83] V. A. Morozov, "On the solution of functional equations by the method of regularization",Sov. Math. Dokl., 7, pp. 414-417, 1966.

[84] V. A. Morozov, "Methods for Solving Incorrectly Posed Problems, "Springer-Verlag, New York, 1984.

[85] A. Narayanan, V. Shmatikov, "Robust de-anonymization of large sparse datasets," IEEE S ymposium on Security and Privacy, 2008, 111-125.

[86] B. K. Natarajan, "Sparse approximate solutions to linear systems," SIAM J. Comput., 24(2):227-234, 1995.

[87] R. Otazo, E. J. Candès and D. Sodickson, "Low-rank and sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components," To appear in Magnetic Resonance in Medicine, 2013.

[88] R. L. Parker, "Understanding inverse theory," Ann. Rev. Earth Planet. Sci., No. 5, pp. 35-64, 1977.

[89] T. Raus, "The principle of the residual in the solution of ill-posed problems with nonselfadjoint operator," Urchin. Zap. Tartu Gos. Univ., 75, pp. 12-20, 1985.

[90] T. Reginska, "A Regularization Parameter in Discrete Ill-Posed Problems," SIAM J. Sci. Comput., No. 17, pp. 740-749, 1996.

[91] F. Ricci, L. Rokach, B. Shapira. P.B. Kantor (Eds.), "Recommender Systems Handbook," Springer, New York, NY, USA, 2011.

[92] E. Sadikov, M. Medina, J. Leskovec, H. Garcia-Molina.,"Correcting for Missing Data in Information Cascades," ACM International Conference on Web Search and Data Mining (WSDM), 2011.

[93] Sarwar, B. "Sparsity, scalability, and distribution in recommender systems, ", PhD thesis, University of Minnesota, 2001.

[94] Sarwar, B., Karypis, G., Konstan, J. A., and Riedl, J. "Application of dimensionality reduction in recommender system—A case study", Proceedings of the ACM WebKDD-2000 Workshop, 2000.

[95] Sarwar, B., Karypis, G., Konstan, J. A., and Riedl, J. "Analysis of recommendation algorithms for e-commerce", Proceedings of the ACME-Commerce, pp.158-167, 2000.

[96] Shardanand, U., Maes, P. "Social Information Filtering: Algorithms for Automating 'Word of Mouth," Conf. Human Factors in Computing Systems, 1995.

[97] Sinha, R., Swearingen, K., "The role of transparency in recommender systems," CHI 2002 Extended Abstracts on Human Factors in Computing Systems, pp. 830-831, ACM, 2002.

[98] A. Tarantola and B. Valette , "Generalized nonlinear inverse problems solved using the least squares criterion," Reviews of Geophysics and Space Physics, No. 20, pp. 219-232, 1993.

[99] A. Tarantola, "Inverse Problem Theory, Elsevir, Amsterdam," 1987.

[100] Tibshirani, R., "Regression Shrinkage and Selection via the Lasso," Journal of the Royal Statistical Society, Series B, 58, 267-288, 1996.

[101] R. Tibshirani, "Regression shrinkage and selection via the LASSO," Journal of the Royal

statistical society, series B, 58(1):267-288, 1996.

[102] A. N. Tikhonov, "Solution of Incorrectly Formulated Problems and the Regularization Method," Soviet Math. Dokl., 4(1963), pp. 1035-1038; English translation of Dokl. Akad. Nauk. SSSR, 151(1963), pp. 501-504, 1963.

[103] A. N. Tikhonov, "Regularization of incorrectly posed problems," Dokl. Akad. Nauk. SSSSR, 153, (1963), pp. 49-52= Soviet Math. Dokl., 4, 1963.

[104]A. N. Tikhonov, V. Y. Arsenin, "Solutions of Ill-Posed Problems," Winston, Washington, D.C. (1977).

[105]A. N. Tikhonov, A. V. Goncharsky(Eds), "Ill-Posed Problems in the Natural Sciences,", MIR, Moscow, 1987.

[106] A. N. Tikhonov, A. V. Goncharsky, V. V. Stepanov, A. G. Yagola, "Numerical Methods for the Solution of Ill-Posed Problems," Kluwer, Dordrecht, the Netherlands, 1995.

[107] L. Ungar and D. Foster, "Aformal statistical approach to collaborative filtering," CONALD'98, 1998.

[108] L. Ungar and D. Foster., "Clustering methods for collaborative filtering. In Workshop on Recommendation Systems," Fifteenth National Conference on AI, 1998.

[109] Wang, W., Benbasat, I., "Recommendation agents for electronic commerce," Effects of explanation facilities on trusting beliefs. Journal of Management Information Systems 23(4), 217-246, 2007.

[110] Wang, W., Benbasat, I., "Attributions of trust in decision support technologies: A study of recommendation agents for e-commerce," Journal of Management Information Systems 24(4), 249-273, 2008.

[111] Wei, K., Huang, J., Fu, S., "A survey of e-commerce recommender systems," In: International Conference on Service Systems and Service Management, pp. 1-5. IEEE, 2007.

[112] R. Witten and E. J. Candès, "Randomized algorithms for low-rank matrix factorizations: sharp performance bounds," To appear in Algorithmica, 2013.

[113] Z. Zhang, H. Zha, and H. Simon, "Low rank approximations with sparse factors I: basic algorithms and error analysis," SIAM journal on matrix analysis and its applications, 23(3):706-727, 2002.

[114] Z Zhou, J. Wright, X. Li, E. J. Candès and Y. Ma, "Stable Principal Component Pursuit," Proceedings of International Symposium on Information Theory, June 2010.

[115] H. Zou, T. Hastie, and R. Tibshirani, "Sparse Principal Component Analysis," Journal of Computational & Graphical Statistics, 15(2):265-286, 2006.