微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

基准缓存字典导致“读取字节时出现意外 EOF”

如何解决基准缓存字典导致“读取字节时出现意外 EOF”

我在 vm 压力测试缓存字典上安装了 Clickhouse 版本 20.8.3.18 和 python3。在使用 clickhouse_driver 输入一定数量查询后,我会收到错误

Unexpected EOF while reading bytes

这是由于驱动程序/python 相关的错误还是由于系统上的缓存已达到最大值。例如,这发生在具有 32Gb RAM 和 256Gb SSD 内存的机器上的文件大小为 203 列和 10000 行上,一个大约 66Mb 的 csv 文件对于这样的错误来说似乎很小。我正在运行的查询是:

SELECT  
    dictGet('CacheDictionary','date',toUInt64(number)) AS date,SUM(dictGet('CacheDictionary','filterColumn',toUInt64(number))) AS val,AVG(dictGet('CacheDictionary',toUInt64(number))) AS avg 
FROM numbers(1,10000) 
GROUP BY date

csv 文件的示例条目是:

20000,2021-02-05,6867,0.5314826651111791,OA9SMRN54LC3MTDW,D6S8AYXZ3JVSHPCY,12UQV1JR87MT00EP,3WBT23MA2QN6URA7,YGKJR5577BP6S3AD,2T90WPW1REOZA0L9,JQG8Z6FXXIX2788M,OAOVV1YX3A6HKQV8,FISBMOAHEXHAAKEY,XAULW5F90T3VEMUL,RAAZ5TM5XL7GRC1F,B16JEGDHXUXFI2R9,DETSZ7BR45CRAIA7,Z2X53PAQYCSBHPU3,SRISC0ZLWXC2DP34,KO2M3044JX5JCB74,ML776REFIX3Z1L78,ND6PXBOR135SWFSB,ZF4K45N2AIGFAK0L,RFE3EHCKC5EPYE2V,NJKM5T8UUD5NRDPX,O57IQW0670LP00I9,F0EBZ3BXHPETCFSY,RUZ7VH2IM0DIZ4UC,08BP467WG7ROEHTJ,9LSTNLUA240T2K4D,5L4PIRKMK746QW5Q,2VX3SER8ULU93NZG,Z0MZ9C3TTPR6WFDV,KB32XWCR67AWGSIB,PDM8QJ34X4EOTVN1,P7TUVP8Q1YF9S746,YDfdbCG6S2EXYPNW,55RN0F4UMGF3ABQZ,RRF895J8LQSLI48U,54OQWCJODIEQLRQF,D5ZJPGAG7CCO4LWA,UQDWEXPI184UUJQD,3QF6QAS32ITRL8JH,FPQ324RO04LNVAMO,ZJ6QCWNQCBQOE7F5,6OWVEVWHNSZILC6E,GIUD29OIFF3LUCCX,VGBJHKW32BUNUSDH,908TDRODVZIIC5O8,UCIU38BXEREJMO4M,5LKJ23ER4CKUZ88J,A1GBKPPM10L8X5RM,BB3SAVWF3CNBDXHO,279MIC1OXTDS2PFP,J6UVFJE8RGFK4LDN,3CE12GT27GX0WVWU,PNNTRLDFVJQ0TCRK,MI7XOHWUQX3W938H,LKZPV4K0BA6OE3R0,YJMLI82UBLSZWP7U,JORNKD1MsveCXBRF,CO5KKJIL1fheYA11,GXVXWDOI538WCLC0,OPODB2R2ITSX0E6J,3VE7SOJZL3DKIES7,5LPXB17GJ94S86HL,UQ0DZVUDMBD39LC3,KSSVOBUKMZC7T89M,P6YL0WW22NOM5A36,RA46SZF4ZLO5YWUM,TUTMJ34X4040USXX,09HPKJAD58P3FVMP,DM0NJVFYKR2653HH,HP869NM4Y2EBE3ND,RVKP40RPBOPB6RPQ,WI3QXYA5XIWJUFUK,770L6U5KAEPKKJC1,2H0xnuDM41QBAZWB,8AWJ2Y7RB9F2WTT0,Y6T3PIPLU3FCBZCU,CY8SCO15RNUWQU2B,DRC88XH21J9ADT6Z,MLZ2JN7F8MXVBHBI,2YSUVHRL4V0EVHXF,Y0U12EBQSEVE6W6X,A6RRJY191S0JOXJH,4F12P4K0SJ6EDKSD,THCRJ2ZEXGM1RUM4,PF0OUAULUNIW0W9X,EK1249WXC0C2KKY8,11WEDAAJL7BL4T4U,4K8OP1WXSN1MIXPF,8D0WNN1672A6WK07,5RLYH7K00ZSR1LL2,EKEXBG87U1X6UOLL,YWK3V1F7MTAF9T19,XZ8ZF0XO5V8TCBPS,A3RX8X8A8I11Z8X3,77P2Q5WRSTL4ERAI,00BGNPDYFSVG5F81,5KTUM76C42VTP4I7,TA933GZZN8OQ20QJ,612WNQ74RDHMBWX3,D41HNOBPX11GFYWO,OGR4A0EPCSS00XL6,QIOH165Y5JGKJMFC,TF2R9TFC5TJN2PER,TYNXWI46H7I83O77,JMD5DOEV4U628SDK,D7ECJH43FEC77UCJ,FKA9AT5J20QI3MQP,7QSU0I8VRRLUMD7R,6OJ1O2XI2QJXP6W2,UD2QVJxnuFRCAO43,GS3TZUW8U6Z8EWWQ,QD79GBSO6D6GCAZ1,GQ5TUY2FMJSNMTRK,OGOYL2PD64E2DOOQ,Q733OU5P7J7SAFS1,GBS7MV5QOMQ4E89N,SB8MIQ1P37HMQZBJ,Z6G96BM7FL4150H3,05PS81HW528971RM,6F3KFLYT0345GI43,G65CDWEORNH3OUCY,12F43L99AZ84PDWR,GQQVWMTMS471WAWD,F1DFWRJ1F9M9MUTT,1M734H07IQAW49Q3,OPSRG5J7370227XE,BIPNR22KFF71MKQN,PV7DWGCQF5551FKT,YPGQVGUP37MRJY2B,RILKP96QV69WBW2D,4RXDCJURAVCQEGLX,XGIPC0AK1K0I6KDP,HMSE306L5NAK62LC,YAZHMS2UHGMWIB44,RZCAVUM45YTNV23T,3B7K07XPRTE8OMW1,FTP48ED5DQ4K3DM8,WW419RRJ2WU1F15L,85FWD49J0ARSUGI9,4U4768ANPCJ46K5P,EJ24BNUA6OZMUDEL,6Z27W6BN36GO8QWU,5AMZ4UU819GSI454,KMNIEJ2V5PI83KGP,APT4CYG8M5FM0BSW,IME5VRP08W468DZE,6BT4W0ZAW6C7993L,DRD6Q4P8BZVDG37U,2R1OEWQFV5J597AF,CKS41A6PXKVYICAG,OQYZ9UOQRVS3LLTF,JA3PZSAXFCJVZVLB,J23BP73T6GNC0Z08,GWOJXMXDVHCRE51Y,I826DE6KEVQK2PFC,6FF5LWM61KCM4C9K,P16P80EIX2X87OZO,O5GEOEO72CDV4GAX,UMKFUKMV6U0L5PM5,U64YI4G53LR3SC6J,CLML8KPAL697KYYJ,LMH2W0STEJ5H2J2S,AL61EP61ZR3GOPN3,Z3AEUMZSX4MQJ6M6,IS5RFEWIJ8XHYNK0,TNE1BS4JYN280PIF,67IER2YS6N2XHEW1,63P3O4X42T2INRT4,XYV043108XRK7Y4S,RW0HN600K0GQXF4Y,BZ1ZE6IBB4B72A81,QHAINYDIZX7838YI,7FFCKG3XJSZ2DIHJ,DF6C1OMPC1ETFPDZ,1EJ3EW0TXKVBC88R,WX6HG8FD021VFZ2S,W4OB9NZRODSTM96M,6GDA3L5CLBPVTPWQ,1Y4U7BL9UHPBJvix,Y31SUUZ0JF2AXZWO,PL2I18PA0SVXG85E,TEY1HC97QMZ5YXMI,T49EVLLM43AI4OG3,0SDNMLWY85Z7NENX,4446QKGO8UL6RERT,IMEAM22I51GT4ZHY,HUCLC93NIUG0C5R0,5VPBRUUVMBXP7HJY,XCOOPM3JU5VHQ94T,3LRZGAF451G9XDIN,Y6VIN1E31NYRLA2N,RAROO2EM5Q9NJRG9,NUQ2QJ9M6T5KRCHK,WQKKQK8UBB30GRWI,20SOMMKD08FYAENW,1G9K4UFWAI8Q7Z8K,XLG898A4MQXZHVYR,FPT67A7VDLVZEWYH,6DQ6417FF07FORXZ,10RUAPY5KGAYBZZD

我已经发布了部分代码,试图找到存储的最大缓存项数,以及为每个项执行的查询。在 selectBenchmark 中,string 对应于上面的查询。每个参数的参数都是不言自明的(xmlFile 是在 /etc/lib/clickhouse-server 中创建的字典)。

def cacheMaxItems(csvRead,xmlFile,benchmarkType,columnStepSize,rowStepSize):
    maxCache = []
    os.system('rm -f ' + csvRead)
    os.system('bash /root/restartCH.sh')
    for j in range(1,13):
        outputCSV = '/root/results' + benchmarkType + '/cacheResults' + str(j*columnStepSize) + '.csv'  
        with open(outputCSV,'w') as fp:
            wr = csv.writer(fp)  
            wr.writerow([benchmarkType + ': Number of rows','Loading time','Mean','Variance','Skewness','Number of Columns: ' + str(j*columnStepSize)])
        for i in range(1,10000):
            if i%5 == 0:
                os.system('bash /root/restartCH.sh')
            createCSV(10000,j*columnStepSize,csvRead)
            try:
                clickhouseDictionary(rowStepSize*i*j*columnStepSize,csvRead,'Cache')
                if benchmarkType == 'Random':
                    results = selectBenchmark(i*rowStepSize,'Random','Cache')
                elif benchmarkType == 'Consecutive':
                    results = selectBenchmark(i*rowStepSize,'Consecutive','Cache')
                elif benchmarkType == 'cpu':
                    results = selectBenchmark(i*rowStepSize,'cpu','Cache')
                results.insert(0,i*rowStepSize)
                with open(outputCSV,'a') as fp:
                    wr = csv.writer(fp)  
                    wr.writerow(results)

                print('Successfully loaded and queried cache of size ' + str(rowStepSize*i*j*columnStepSize) + '.')
            except Exception as ex:
                print(ex)
                os.system('rm -f ' + csvRead)
                os.system('bash /root/restartCH.sh')
                maxCache.append([j*columnStepSize,(i-1)*rowStepSize])
                print(maxCache)
                break
    return maxCache
def selectBenchmark(numberOfRows,numberOfColumns,dictType):
    client = Client('localhost',port=9000,database='system')
    client.execute('SYstem RELOAD DICTIONARY ' + dictType + 'Dictionary')
    loadingTime = client.last_query.elapsed
    client.execute('SELECT dictGet(\'' + dictType + 'Dictionary\',\'random0\',toUInt64(1))',query_id=str(uuid.uuid4()))
    loadingTime += client.last_query.elapsed
    loop = True
    counter = 0
    j=0
    while loop:
        times = []
        for i in range(0,31):
            query_id = str(uuid.uuid4())
            string = stringGen(numberOfRows,dictType)
            client.execute(string,query_id = query_id)
            times.append(client.last_query.elapsed)  
        if max(times) > loadingTime:
            loadingTime = max(times)
        stats = transformedMLE(times)
        redactedTimes = [x for x in times if (stats[0]-3*np.sqrt(stats[1])) < x < (stats[0]+3*np.sqrt(stats[1]))]
        if len(times) - len(redactedTimes) <= 3:
            loop = False
        elif j > 15:
            print('High variance query')
            loop = False
        j+=1
    result = transformedMLE(redactedTimes)
    loadingTime = loadingTime - result[0]
    result.insert(0,loadingTime)
    client.disconnect()
    return result

restartCH.sh 文件

service clickhouse-server forcerestart

因为缓存溢出经常阻塞 restart 命令。 服务器错误日志没有输出表明这是python驱动程序的问题,可能是读取了大量返回的数据。我还得到了 'Killed' python 输出,它也指向缓存问题,这是我对缓存字典进行基准测试时可以预料的。

解决方法

读取字节时出现意外 EOF -- 这是 python 驱动程序错误。

检查 clickhouse-server.log 是否有真正的错误。

20.8.3.18 不支持,请升级到 20.8.12.2

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。