微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

《利用Python进行数据分析》笔记---第2章--MovieLens 1M数据集

写在前面的话:

实例中的所有数据都是在GitHub上下载的,打包下载即可。
地址是: [ http://github.com/pydata/pydata-book ](http://github.com/pydata/pydata-
book)

还有一定要说明的:

我使用的是Python2.7,书中的代码有一些有错误,我使用自己的2.7版本调通。

    # coding: utf-8
    import pandas as pd
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\users.dat', sep='::', header=None, names=unames)
    rnmaes = ['user_id','movie_id','rating','timestamp']
    ratings = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\\ratings.dat', sep='::', header=None, names=rnmaes)
    mnames = ['movie_id','title','genres']
    movies = pd.read_table('D:\Source Code\pydata-book-master\ch02\movielens\movies.dat', sep='::', header=None, names=mnames)
    
    users[:5]
    ratings[:5]
    movies[:5]
    
    ratings
    
    data = pd.merge(pd.merge(ratings, users), movies)
    data.ix[0]
    mean_rating = data.pivot_table('rating', index='title', columns='gender', aggfunc='mean')
    mean_rating[:5]
    ratings_by_title = data.groupby('title').size()
    ratings_by_title[:10]
    
    active_titles = ratings_by_title.index[ratings_by_title >= 250]
    active_titles
    
    mean_rating = mean_rating.ix[active_titles]
    mean_rating
    
    top_female_rating = mean_rating.sort_index(by='F', ascending=False)
    top_female_rating[:10]
    
    mean_rating['diff'] = mean_rating['M'] - mean_rating['F']
    sorted_by_diff = mean_rating.sort_index(by='diff')
    sorted_by_diff[:15]
    
    sorted_by_diff[::-1][:15]
    
    ratings_std_by_title = data.groupby('title')['rating'].std()
    ratings_std_by_title = ratings_by_title.ix[active_titles]
    ratings_std_by_title.order(ascending=False)[:10]
    ratings_std_by_title
[/code]


![在这里插入图片描述](https://www.icode9.com/i/ll/?i=20210608151750993.gif)

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐