如何解决如何创建一个字典,其字典值会根据在主数据集上找到多少重复项而扩展?
我正在使用的数据集可以在这里找到:https://www.kaggle.com/lava18/google-play-store-apps 此数据集有两列可对App的类型进行分类(第1列和第9列-我从第一列开始计数为0)。也许下面的图片会有所帮助:
第1列的数据比第9列的数据粒度小,因此字典键将是Column1,而值将是Column9。我已经有一个函数可以查看第1列到第9列中每个类别的百分比。
def freq_table(dataset,index_category):
table = {}
total = 0
for row in dataset:
total += 1
category = row[index_category]
if category in table:
table[category] += 1
else:
table[category] = 1
table_percentages = {}
cat_num=0
for key in table:
cat_num+=1
percentage = (table[key] / total) * 100
table_percentages[key] = percentage
print(f'Total Number of Categories: {cat_num}')
return table_percentages
#Removing from being a dictionary and putting in a Descending Order
def display_table(dataset,index_category):
table = freq_table(dataset,index_category)
table_display = []
for key in table:
key_val_as_tuple = (table[key],key)
#The order of this sentence is - Percentage and Category,because the function sorted gets the first element to sort it
#And this is the Percentage since we want a Descending Order
#This is a Tuple since we will not need to change these values and it is easy to pack values together
table_display.append(key_val_as_tuple)
#In order to pack everything in one object,we use List Append (Tuples don't have Append)
table_sorted = sorted(table_display,reverse = True) #We choose the Descending Order in the Percentage Field here
for entry in table_sorted:
print(entry[1],':',entry[0],'%')
#Before the order was Percentage : Category,Now to be more user friendly we change to Category : Percentage
家庭(column0)的流派(column9):“休闲;脑力游戏”占35%,“教育;创造力”占20%,“教育;教育”占45%
让我知道是否需要进一步的信息,并非常感谢您的帮助。'
解决方法
对于给定的文件,最好使用pandas
。熊猫将以表格格式读取文件,并且提供了许多分组选项-
import pandas as pd
x = pd.read_csv("googleplaystore.csv")
x = x.groupby(["Category","Genres"],as_index=False)[['App']].count().rename(columns={'App' : 'cnt'})
x_tot = x.groupby(["Category"],as_index=False)[['cnt']].sum().rename(columns={'cnt' : 'tot'})
x = x.merge(x_tot,on = ['Category'])
x['pct'] = x['cnt']/x['tot']
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。