微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

将具有许多列的数据集重整为长格式-笔记本崩溃

如何解决将具有许多列的数据集重整为长格式-笔记本崩溃

我正在尝试使用pandas.wide_to_long将多个数据集从宽格式转换为长格式。数据集最大可以为1 GB,可以有数百列。 在我遍历数据集1到25的代码中,仅选择相关列并对它们进行整形。

persons_related = pd.DataFrame()
for i in range(1,26):
    df = pd.read_csv('persons_' + str(i) + '.csv',sep=',',error_bad_lines=False,index_col=False,dtype='unicode')
    df_ = df[
            list(df.loc[:,'id':'uri'])  + 
            list(df.loc[:,df.columns.str.startswith("relatedCompanies")])
            ]
    
    persons_related_ = {}
    name = str(i)
    persons_related_[name] = pd.wide_to_long(df_,['relatedCompanies_'],i = 'id',j = 'suffix',suffix = '\d+_\w+') 
    persons_related_[name] = persons_related_[name][persons_related_[name]['relatedCompanies_'].notna()]
    #option 1: persons_related = persons_related.append(persons_related_[name])
    #option 2: persons_related_[name].to_csv('persons_related_' + name + '.csv')            

我正在Jupyter笔记本中运行此代码。我想创建一个较大的数据帧persons_related或以长格式保存单个CSV文件,然后根据以后的时间追加,以便以后进行补充。

不幸的是,笔记本电脑一直崩溃(内核死了)。到目前为止,我已经设法仅在两个文件上运行代码

关于列和行的数量,我认为文件可能太大了。 我不知道如何优化代码,以便可以在合理的时间内在所有文件上运行它。 有什么想法吗?

文件1到25中的列如下:

df.columns.values

array(['Unnamed: 0','id','uri','crifId','currentName_firstName','currentName_lastName','currentName_name','currentName_title','currentName_validFrom','currentName_invalidFrom','currentName_language','names_0_firstName','names_0_lastName','names_0_name','names_0_title','names_0_validFrom','names_0_invalidFrom','names_0_language','hometowns','currentDomicile_city','currentDomicile_zip','currentDomicile_validFrom','currentDomicile_invalidFrom','domiciles_0_city','domiciles_0_zip','domiciles_0_validFrom','domiciles_0_invalidFrom','relatedCompanies_0_name','relatedCompanies_0_uri','relatedCompanies_0_mandates_0_name','relatedCompanies_0_mandates_0_uri','relatedCompanies_0_sectors','gender','validFrom','activeMandate','lastMandateInvalidFrom','deleted','noIndex','lastmod','domiciles_1_city','domiciles_1_zip','domiciles_1_validFrom','domiciles_1_invalidFrom','relatedCompanies','relatedCompanies_0_mandates','relatedCompanies_0_mandates_1_name','relatedCompanies_0_mandates_1_uri','relatedCompanies_0_mandates_2_name','relatedCompanies_0_mandates_2_uri','relatedCompanies_0_mandates_3_name','relatedCompanies_0_mandates_3_uri','relatedCompanies_0_mandates_4_name','relatedCompanies_0_mandates_4_uri','relatedCompanies_0_mandates_5_name','relatedCompanies_0_mandates_5_uri','relatedCompanies_1_name','relatedCompanies_1_uri','relatedCompanies_1_mandates','relatedCompanies_1_sectors','names_1_firstName','names_1_lastName','names_1_name','names_1_title','names_1_validFrom','names_1_invalidFrom','names_1_language','relatedCompanies_0_mandates_6_name','relatedCompanies_0_mandates_6_uri','relatedCompanies_0_mandates_7_name','relatedCompanies_0_mandates_7_uri','relatedCompanies_1_mandates_0_name','relatedCompanies_1_mandates_0_uri','relatedCompanies_1_mandates_1_name','relatedCompanies_1_mandates_1_uri','relatedCompanies_1_mandates_2_name','relatedCompanies_1_mandates_2_uri','relatedCompanies_1_mandates_3_name','relatedCompanies_1_mandates_3_uri','relatedCompanies_1_mandates_4_name','relatedCompanies_1_mandates_4_uri','noIndex_value','noIndex_modifiedUserId','relatedCompanies_0_mandates_8_name','relatedCompanies_0_mandates_8_uri','relatedCompanies_0_mandates_9_name','relatedCompanies_0_mandates_9_uri','relatedCompanies_2_name','relatedCompanies_2_uri','relatedCompanies_2_mandates','relatedCompanies_2_sectors','domiciles_2_city','domiciles_2_zip','domiciles_2_validFrom','domiciles_2_invalidFrom','relatedCompanies_1_mandates_5_name','relatedCompanies_1_mandates_5_uri','relatedCompanies_1_mandates_6_name','relatedCompanies_1_mandates_6_uri','relatedCompanies_1_mandates_7_name','relatedCompanies_1_mandates_7_uri','relatedCompanies_1_mandates_8_name','relatedCompanies_1_mandates_8_uri','relatedCompanies_1_mandates_9_name','relatedCompanies_1_mandates_9_uri','domiciles_3_city','domiciles_3_zip','domiciles_3_validFrom','domiciles_3_invalidFrom','relatedCompanies_3_name','relatedCompanies_3_uri','relatedCompanies_3_mandates_0_name','relatedCompanies_3_mandates_0_uri','relatedCompanies_3_sectors','relatedCompanies_2_mandates_0_name','relatedCompanies_2_mandates_0_uri','relatedCompanies_2_mandates_1_name','relatedCompanies_2_mandates_1_uri','names_2_firstName','names_2_lastName','names_2_name','names_2_title','names_2_validFrom','names_2_invalidFrom','names_2_language','domiciles','relatedCompanies_2_mandates_2_name','relatedCompanies_2_mandates_2_uri','relatedCompanies_2_mandates_3_name','relatedCompanies_2_mandates_3_uri','relatedCompanies_3_mandates_1_name','relatedCompanies_3_mandates_1_uri','relatedCompanies_3_mandates_2_name','relatedCompanies_3_mandates_2_uri','relatedCompanies_3_mandates_3_name','relatedCompanies_3_mandates_3_uri','relatedCompanies_2_mandates_4_name','relatedCompanies_2_mandates_4_uri','relatedCompanies_2_mandates_5_name','relatedCompanies_2_mandates_5_uri','relatedCompanies_2_mandates_6_name','relatedCompanies_2_mandates_6_uri','names_3_firstName','names_3_lastName','names_3_name','names_3_title','names_3_validFrom','names_3_invalidFrom','names_3_language','relatedCompanies_3_mandates','relatedCompanies_4_name','relatedCompanies_4_uri','relatedCompanies_4_mandates','relatedCompanies_4_sectors','relatedCompanies_4_mandates_0_name','relatedCompanies_4_mandates_0_uri','relatedCompanies_5_name','relatedCompanies_5_uri','relatedCompanies_5_mandates','relatedCompanies_5_sectors','relatedCompanies_6_name','relatedCompanies_6_uri','relatedCompanies_6_mandates_0_name','relatedCompanies_6_mandates_0_uri','relatedCompanies_6_sectors','domiciles_4_city','domiciles_4_zip','domiciles_4_validFrom','domiciles_4_invalidFrom','relatedCompanies_3_mandates_4_name','relatedCompanies_3_mandates_4_uri','relatedCompanies_3_mandates_5_name','relatedCompanies_3_mandates_5_uri','relatedCompanies_3_mandates_6_name','relatedCompanies_3_mandates_6_uri','relatedCompanies_3_mandates_7_name','relatedCompanies_3_mandates_7_uri','relatedCompanies_2_mandates_7_name','relatedCompanies_2_mandates_7_uri','relatedCompanies_2_mandates_8_name','relatedCompanies_2_mandates_8_uri','relatedCompanies_2_mandates_9_name','relatedCompanies_2_mandates_9_uri','relatedCompanies_3_mandates_8_name','relatedCompanies_3_mandates_8_uri','relatedCompanies_3_mandates_9_name','relatedCompanies_3_mandates_9_uri','relatedCompanies_4_mandates_1_name','relatedCompanies_4_mandates_1_uri','relatedCompanies_4_mandates_2_name','relatedCompanies_4_mandates_2_uri','relatedCompanies_4_mandates_3_name','relatedCompanies_4_mandates_3_uri','relatedCompanies_4_mandates_4_name','relatedCompanies_4_mandates_4_uri','relatedCompanies_4_mandates_5_name','relatedCompanies_4_mandates_5_uri','relatedCompanies_5_mandates_0_name','relatedCompanies_5_mandates_0_uri','relatedCompanies_5_mandates_1_name','relatedCompanies_5_mandates_1_uri','relatedCompanies_5_mandates_2_name','relatedCompanies_5_mandates_2_uri','relatedCompanies_5_mandates_3_name','relatedCompanies_5_mandates_3_uri','relatedCompanies_6_mandates_1_name','relatedCompanies_6_mandates_1_uri','relatedCompanies_6_mandates_2_name','relatedCompanies_6_mandates_2_uri','relatedCompanies_6_mandates_3_name','relatedCompanies_6_mandates_3_uri','relatedCompanies_6_mandates_4_name','relatedCompanies_6_mandates_4_uri','relatedCompanies_6_mandates_5_name','relatedCompanies_6_mandates_5_uri','relatedCompanies_6_mandates_6_name','relatedCompanies_6_mandates_6_uri','relatedCompanies_7_name','relatedCompanies_7_uri','relatedCompanies_7_mandates_0_name','relatedCompanies_7_mandates_0_uri','relatedCompanies_7_mandates_1_name','relatedCompanies_7_mandates_1_uri','relatedCompanies_7_sectors','relatedCompanies_6_mandates','names_4_firstName','names_4_lastName','names_4_name','names_4_title','names_4_validFrom','names_4_invalidFrom','names_4_language','relatedCompanies_5_mandates_4_name','relatedCompanies_5_mandates_4_uri','names_5_firstName','names_5_lastName','names_5_name','names_5_title','names_5_validFrom','names_5_invalidFrom','names_5_language','names_6_firstName','names_6_lastName','names_6_name','names_6_title','names_6_validFrom','names_6_invalidFrom','names_6_language','relatedCompanies_5_mandates_5_name','relatedCompanies_5_mandates_5_uri','relatedCompanies_5_mandates_6_name','relatedCompanies_5_mandates_6_uri','relatedCompanies_5_mandates_7_name','relatedCompanies_5_mandates_7_uri','relatedCompanies_7_mandates','relatedCompanies_7_mandates_2_name','relatedCompanies_7_mandates_2_uri','relatedCompanies_7_mandates_3_name','relatedCompanies_7_mandates_3_uri','relatedCompanies_4_mandates_6_name','relatedCompanies_4_mandates_6_uri','relatedCompanies_4_mandates_7_name','relatedCompanies_4_mandates_7_uri','relatedCompanies_4_mandates_8_name','relatedCompanies_4_mandates_8_uri','relatedCompanies_8_name','relatedCompanies_8_uri','relatedCompanies_8_mandates','relatedCompanies_8_sectors','relatedCompanies_9_name','relatedCompanies_9_uri','relatedCompanies_9_mandates','relatedCompanies_9_sectors','relatedCompanies_10_name','relatedCompanies_10_uri','relatedCompanies_10_mandates','relatedCompanies_10_sectors','relatedCompanies_11_name','relatedCompanies_11_uri','relatedCompanies_11_mandates','relatedCompanies_11_sectors','domiciles_5_city','domiciles_5_zip','domiciles_5_validFrom','domiciles_5_invalidFrom','relatedCompanies_5_mandates_8_name','relatedCompanies_5_mandates_8_uri','relatedCompanies_5_mandates_9_name','relatedCompanies_5_mandates_9_uri','relatedCompanies_8_mandates_0_name','relatedCompanies_8_mandates_0_uri','relatedCompanies_8_mandates_1_name','relatedCompanies_8_mandates_1_uri','latestChangeDateCreditrating','relatedCompanies_7_mandates_4_name','relatedCompanies_7_mandates_4_uri','relatedCompanies_7_mandates_5_name','relatedCompanies_7_mandates_5_uri','relatedCompanies_8_mandates_2_name','relatedCompanies_8_mandates_2_uri','relatedCompanies_8_mandates_3_name','relatedCompanies_8_mandates_3_uri','relatedCompanies_8_mandates_4_name','relatedCompanies_8_mandates_4_uri','relatedCompanies_8_mandates_5_name','relatedCompanies_8_mandates_5_uri','relatedCompanies_9_mandates_0_name','relatedCompanies_9_mandates_0_uri','relatedCompanies_9_mandates_1_name','relatedCompanies_9_mandates_1_uri','relatedCompanies_9_mandates_2_name','relatedCompanies_9_mandates_2_uri','relatedCompanies_9_mandates_3_name','relatedCompanies_9_mandates_3_uri','relatedCompanies_9_mandates_4_name','relatedCompanies_9_mandates_4_uri','relatedCompanies_9_mandates_5_name','relatedCompanies_9_mandates_5_uri','relatedCompanies_10_mandates_0_name','relatedCompanies_10_mandates_0_uri','relatedCompanies_10_mandates_1_name','relatedCompanies_10_mandates_1_uri','relatedCompanies_10_mandates_2_name','relatedCompanies_10_mandates_2_uri','relatedCompanies_10_mandates_3_name','relatedCompanies_10_mandates_3_uri','relatedCompanies_10_mandates_4_name','relatedCompanies_10_mandates_4_uri','relatedCompanies_10_mandates_5_name','relatedCompanies_10_mandates_5_uri','relatedCompanies_11_mandates_0_name','relatedCompanies_11_mandates_0_uri','relatedCompanies_11_mandates_1_name','relatedCompanies_11_mandates_1_uri','relatedCompanies_11_mandates_2_name','relatedCompanies_11_mandates_2_uri','relatedCompanies_11_mandates_3_name','relatedCompanies_11_mandates_3_uri','relatedCompanies_11_mandates_4_name','relatedCompanies_11_mandates_4_uri','relatedCompanies_11_mandates_5_name','relatedCompanies_11_mandates_5_uri','relatedCompanies_12_name','relatedCompanies_12_uri','relatedCompanies_12_mandates_0_name','relatedCompanies_12_mandates_0_uri','relatedCompanies_12_mandates_1_name','relatedCompanies_12_mandates_1_uri','relatedCompanies_12_mandates_2_name','relatedCompanies_12_mandates_2_uri','relatedCompanies_12_mandates_3_name','relatedCompanies_12_mandates_3_uri','relatedCompanies_12_mandates_4_name','relatedCompanies_12_mandates_4_uri','relatedCompanies_12_mandates_5_name','relatedCompanies_12_mandates_5_uri','relatedCompanies_12_sectors','relatedCompanies_13_name','relatedCompanies_13_uri','relatedCompanies_13_mandates_0_name','relatedCompanies_13_mandates_0_uri','relatedCompanies_13_mandates_1_name','relatedCompanies_13_mandates_1_uri','relatedCompanies_13_mandates_2_name','relatedCompanies_13_mandates_2_uri','relatedCompanies_13_mandates_3_name','relatedCompanies_13_mandates_3_uri','relatedCompanies_13_mandates_4_name','relatedCompanies_13_mandates_4_uri','relatedCompanies_13_mandates_5_name','relatedCompanies_13_mandates_5_uri','relatedCompanies_13_sectors','relatedCompanies_14_name','relatedCompanies_14_uri','relatedCompanies_14_mandates_0_name','relatedCompanies_14_mandates_0_uri','relatedCompanies_14_mandates_1_name','relatedCompanies_14_mandates_1_uri','relatedCompanies_14_mandates_2_name','relatedCompanies_14_mandates_2_uri','relatedCompanies_14_mandates_3_name','relatedCompanies_14_mandates_3_uri','relatedCompanies_14_mandates_4_name','relatedCompanies_14_mandates_4_uri','relatedCompanies_14_mandates_5_name','relatedCompanies_14_mandates_5_uri','relatedCompanies_14_sectors','relatedCompanies_15_name','relatedCompanies_15_uri','relatedCompanies_15_mandates_0_name','relatedCompanies_15_mandates_0_uri','relatedCompanies_15_mandates_1_name','relatedCompanies_15_mandates_1_uri','relatedCompanies_15_mandates_2_name','relatedCompanies_15_mandates_2_uri','relatedCompanies_15_mandates_3_name','relatedCompanies_15_mandates_3_uri','relatedCompanies_15_mandates_4_name','relatedCompanies_15_mandates_4_uri','relatedCompanies_15_mandates_5_name','relatedCompanies_15_mandates_5_uri','relatedCompanies_15_sectors','relatedCompanies_16_name','relatedCompanies_16_uri','relatedCompanies_16_mandates_0_name','relatedCompanies_16_mandates_0_uri','relatedCompanies_16_mandates_1_name','relatedCompanies_16_mandates_1_uri','relatedCompanies_16_mandates_2_name','relatedCompanies_16_mandates_2_uri','relatedCompanies_16_mandates_3_name','relatedCompanies_16_mandates_3_uri','relatedCompanies_16_mandates_4_name','relatedCompanies_16_mandates_4_uri','relatedCompanies_16_mandates_5_name','relatedCompanies_16_mandates_5_uri','relatedCompanies_16_sectors','relatedCompanies_17_name','relatedCompanies_17_uri','relatedCompanies_17_mandates_0_name','relatedCompanies_17_mandates_0_uri','relatedCompanies_17_mandates_1_name','relatedCompanies_17_mandates_1_uri','relatedCompanies_17_mandates_2_name','relatedCompanies_17_mandates_2_uri','relatedCompanies_17_mandates_3_name','relatedCompanies_17_mandates_3_uri','relatedCompanies_17_mandates_4_name','relatedCompanies_17_mandates_4_uri','relatedCompanies_17_mandates_5_name','relatedCompanies_17_mandates_5_uri','relatedCompanies_17_sectors','relatedCompanies_18_name','relatedCompanies_18_uri','relatedCompanies_18_mandates_0_name','relatedCompanies_18_mandates_0_uri','relatedCompanies_18_mandates_1_name','relatedCompanies_18_mandates_1_uri','relatedCompanies_18_mandates_2_name','relatedCompanies_18_mandates_2_uri','relatedCompanies_18_mandates_3_name','relatedCompanies_18_mandates_3_uri','relatedCompanies_18_mandates_4_name','relatedCompanies_18_mandates_4_uri','relatedCompanies_18_mandates_5_name','relatedCompanies_18_mandates_5_uri','relatedCompanies_18_sectors','relatedCompanies_19_name','relatedCompanies_19_uri','relatedCompanies_19_mandates_0_name','relatedCompanies_19_mandates_0_uri','relatedCompanies_19_mandates_1_name','relatedCompanies_19_mandates_1_uri','relatedCompanies_19_mandates_2_name','relatedCompanies_19_mandates_2_uri','relatedCompanies_19_mandates_3_name','relatedCompanies_19_mandates_3_uri','relatedCompanies_19_mandates_4_name','relatedCompanies_19_mandates_4_uri','relatedCompanies_19_mandates_5_name','relatedCompanies_19_mandates_5_uri','relatedCompanies_19_sectors','relatedCompanies_12_mandates','relatedCompanies_13_mandates','relatedCompanies_7_mandates_6_name','relatedCompanies_7_mandates_6_uri','relatedCompanies_7_mandates_7_name','relatedCompanies_7_mandates_7_uri','relatedCompanies_14_mandates','relatedCompanies_15_mandates','relatedCompanies_16_mandates','relatedCompanies_17_mandates','relatedCompanies_18_mandates','relatedCompanies_19_mandates','relatedCompanies_20_name','relatedCompanies_20_uri','relatedCompanies_20_mandates','relatedCompanies_20_sectors','relatedCompanies_21_name','relatedCompanies_21_uri','relatedCompanies_21_mandates','relatedCompanies_21_sectors','relatedCompanies_22_name','relatedCompanies_22_uri','relatedCompanies_22_mandates','relatedCompanies_22_sectors','relatedCompanies_23_name','relatedCompanies_23_uri','relatedCompanies_23_mandates','relatedCompanies_23_sectors','relatedCompanies_24_name','relatedCompanies_24_uri','relatedCompanies_24_mandates','relatedCompanies_24_sectors','relatedCompanies_25_name','relatedCompanies_25_uri','relatedCompanies_25_mandates','relatedCompanies_25_sectors','relatedCompanies_26_name','relatedCompanies_26_uri','relatedCompanies_26_mandates','relatedCompanies_26_sectors','relatedCompanies_27_name','relatedCompanies_27_uri','relatedCompanies_27_mandates','relatedCompanies_27_sectors','relatedCompanies_28_name','relatedCompanies_28_uri','relatedCompanies_28_mandates','relatedCompanies_28_sectors','relatedCompanies_29_name','relatedCompanies_29_uri','relatedCompanies_29_mandates','relatedCompanies_29_sectors','relatedCompanies_30_name','relatedCompanies_30_uri','relatedCompanies_30_mandates','relatedCompanies_30_sectors','relatedCompanies_31_name','relatedCompanies_31_uri','relatedCompanies_31_mandates','relatedCompanies_31_sectors','relatedCompanies_32_name','relatedCompanies_32_uri','relatedCompanies_32_mandates','relatedCompanies_32_sectors','relatedCompanies_4_mandates_9_name','relatedCompanies_4_mandates_9_uri'],dtype=object)

结果表将包含以下列: id,suffix,uri,relatedCompanies_ 后缀包含0_name,0_uri,0_mandates_0_name,0_mandates_0_uri,0_mandates_1_name等。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。