这几天在看由人民邮电出版社出版的《Spark机器学习》(Machine Learning with Spark,Nick Pentreath),看的很是郁闷。这本书一会儿用python, 一会儿用scala。由于我很喜欢用python, 所以用Python把这本书的scala代码又实现了一遍。收获很大,比直接对着书敲scala要理解的更多,对整个推荐过程的理解更加深刻。当然,与实际应用中的推荐系统相比,这只是个玩具而已。
以下为我的python代码
|
|
u’196\t242\t3\t881250949’
[u’196’, u’242’, u’3’]
Rating(user=196, product=242, rating=3.0)
3.00704909501
|
|
[Rating(user=789, product=708, rating=5.645492560638761),
Rating(user=789, product=482, rating=5.632324542725622),
Rating(user=789, product=502, rating=5.621572563993868),
Rating(user=789, product=603, rating=5.5589276689720055),
Rating(user=789, product=23, rating=5.552174994200555),
Rating(user=789, product=182, rating=5.429093418196553),
Rating(user=789, product=484, rating=5.424060744309293),
Rating(user=789, product=479, rating=5.416303074919905),
Rating(user=789, product=1020, rating=5.350687998038177),
Rating(user=789, product=494, rating=5.341673625528914)]
|
|
u’1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0‘
|
|
u’Frighteners, The (1996)’
|
|
[Rating(user=789, product=1012, rating=4.0),
Rating(user=789, product=127, rating=5.0),
Rating(user=789, product=475, rating=5.0),
Rating(user=789, product=93, rating=4.0),
Rating(user=789, product=1161, rating=3.0),
Rating(user=789, product=286, rating=1.0),
Rating(user=789, product=293, rating=4.0),
Rating(user=789, product=9, rating=5.0),
Rating(user=789, product=50, rating=5.0),
Rating(user=789, product=294, rating=3.0)]
|
|
|
|
[(u’Godfather, The (1972)’, 5.0),
(u’Trainspotting (1996)’, 5.0),
(u’Dead Man Walking (1995)’, 5.0),
(u’Star Wars (1977)’, 5.0),
(u’Swingers (1996)’, 5.0),
(u’Leaving Las Vegas (1995)’, 5.0),
(u’Bound (1996)’, 5.0),
(u’Fargo (1996)’, 5.0),
(u’Last Supper, The (1995)’, 5.0),
(u’Private Parts (1997)’, 4.0)]
|
|
[(u’Sex, Lies, and Videotape (1989)’, 5.645492560638761),
(u’Some Like It Hot (1959)’, 5.632324542725622),
(u’Bananas (1971)’, 5.621572563993868),
(u’Rear Window (1954)’, 5.5589276689720055),
(u’Taxi Driver (1976)’, 5.552174994200555),
(u’GoodFellas (1990)’, 5.429093418196553),
(u’Maltese Falcon, The (1941)’, 5.424060744309293),
(u’Vertigo (1958)’, 5.416303074919905),
(u’Gaslight (1944)’, 5.350687998038177),
(u’His Girl Friday (1940)’, 5.341673625528914)]
|
|
|
|
(16, 0.48041293748773517)
|
|
[(567, 0.0),
(413, 0.27774090532944073),
(24, 0.28293701045346031),
(184, 0.29302471333565439),
(352, 0.29389332298954041),
(1376, 0.30311195834153437),
(201, 0.30636099292020724),
(741, 0.31300444242427816),
(685, 0.31398875037205187),
(686, 0.31444575073270353)]
|
|
Wes Craven’s New Nightmare (1994)
[(u”Wes Craven’s New Nightmare (1994)”, 0.0),
(u’Tales from the Crypt Presents: Bordello of Blood (1996)’,
0.27774090532944073),
(u’Rumble in the Bronx (1995)’, 0.28293701045346031),
(u’Army of Darkness (1993)’, 0.29302471333565439),
(u’Spice World (1997)’, 0.29389332298954041),
(u’Meet Wally Sparks (1997)’, 0.30311195834153437),
(u’Evil Dead II (1987)’, 0.30636099292020724),
(u’Last Supper, The (1995)’, 0.31300444242427816),
(u’Executive Decision (1996)’, 0.31398875037205187),
(u’Perfect World, A (1993)’, 0.31444575073270353)]
|
|
实际评分: 5.000000, 预测评分: 5.042951, 方差: 0.001845.
|
|
((368, 320), 4.883538186965982)
|
|
((506, 568), (5.0, 4.512168968112584))
|
|
Mean Squared Error = 0.0839340560353
Root Mean Squared Error = 0.28971374844
|
|
|
|
[127, 475, 9, 50, 150, 276, 129, 100, 741, 1012]
|
|
[708, 482, 502, 603, 23, 182, 484, 479, 1020, 494]
|
|
0.0
|
|
(1682, 50)
|
|
|
|
16
[1274 453 135 …, 631 412 96]
|
|
2
[237, 300, 100, 127, 285, 289, 304, 272, 278, 288, 286, 275, 302, 296, 292, 251, 50, 314, 297, 290, 312, 281, 13, 280, 303, 308, 307, 257, 316, 315, 301, 313, 279, 299, 298, 19, 277, 282, 111, 258, 295, 242, 283, 276, 1, 305, 14, 287, 291, 293, 294, 310, 309, 306, 25, 273, 10, 311, 269, 255, 284, 274]
|
|
Mean Average Precision at K = 0.024641131815
|
|
Mean Squared Error = 0.0839340560353
Root Mean Squared Error = 0.28971374844
|
|
Mean Average Precision = 0.0668399759999
|
|
Mean Average Precision at 2000 = 0.0668399759999