微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何计算excel中数据之间的出现次数?

如何解决如何计算excel中数据之间的出现次数?

我有一个包含数千个数据的巨大 CSV 表格,我想制作一个表格,其中包含两个元素的出现次数除以该元素出现的数量 [

CSV data1

比特币在这一行中出现了8次,而API出现了2次,所以比特币与API间的关系:就是 API总是和比特币一起存在,所以API和比特币一起出现的价值是1,比特币和API一起出现的价值是1/4。 我希望最终看起来像这样

enter image description here

我如何使用 python 或任何其他工具来做到这一点?

这是文件样本

sample of the file

解决方法

我认为,这确实可以。我手动将您的电子表格输入到 csv 中(如果能够剪切和粘贴就更好了),结果似乎合理。

import itertools
import csv
import numpy as np

words = {}
for row in open('input.csv'):
    parts = row.rstrip().split(',')
    for a,b in itertools.combinations(parts,2):
        if a not in words:
            words[a] = [b]
        else:
            words[a].append( b )
        if b not in words:
            words[b] = [a]
        else:
            words[b].append( a )

print(words)
size = len(words)
keys = list(words.keys())
track = np.zeros((size,size))

for i,k in enumerate(keys):
    track[i,i] = len(words[k])
    for j in words[k]:
        track[i,keys.index(j)] += 1
        track[keys.index(j),i] += 1

print(keys)

# Scale to [0,1].

for row in range(track.shape[0]):
    track[row,:] /= track[row,row]

# Create a csv with the results.

fout = open('corresp.csv','w')
print( ','.join([' ']+keys),file=fout )
for row in range(track.shape[0]):
    print( keys[row],file=fout,end=',')
    print( ','.join(f"{track[row,i]}" for i in range(track.shape[1])),file=fout )

这是结果的前几行:

,API,Backend Development,Bitcoin,Docker,Article Rewriting,Article writing,Blockchain,Content Writing,Ghostwriting,Android,Ethereum,PHP,React.js,C Programming,C++ Programming,ASIC,Digital ASIC Coding,Embedded Software,Article Writing,Blog,Copy Typing,Affiliate Marketing,Brand Marketing,Bulk Marketing,Sales,BlockChain,Business Strategy,Non-fungible Tokens,Technical Writing,.NET,Arduino,Software Architecture,Bluetooth Low Energy (BLE),C# Programming,Ada programming,Programming,Haskell,Rust,Algorithm,Java,Mathematics,Machine Learning (ML),Matlab and Mathematica,Data Entry,HTML,Circuit Designs,Embedded Systems,Electronics,Microcontroller,Python
API,1.0,0.14285714285714285,0.5714285714285714,0.0,0.2857142857142857,0.0
Backend Development,0.6666666666666666,0.0
Bitcoin,0.21052631578947367,0.05263157894736842,0.2631578947368421,0.10526315789473684,0.15789473684210525,0.0
Docker,0.0
,

我通过在 Excel 中为每个列组合创建一个数据透视表来查看这个:AB AC、AD、BC、BD、CD 并将第一列中的唯一条目(例如 A)放在行中和来自第二个的唯一条目,例如 B,在列中,然后将列 A 放在值区域中,我找到所有匹配项和所有匹配项的计数

这是一个笨拙的方法,但我从已提交的基于 Python 的方法中注意到,我的答案基本上没有比这更多或更少笨拙!

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。