微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

使用Pyhton,Pandas读取各种文件并将其写入一张Excel工作表

如何解决使用Pyhton,Pandas读取各种文件并将其写入一张Excel工作表

不管我做什么,我都不会做,所有来自xhtml文件的数据都将写在一张Excel工作表中。看起来,Python遍历了该文件夹中的所有文件,但作为输出,我仅从上一个文件获取数据。 帮助会很棒!

#!/usr/bin/python3

# Import libaries
import pandas as pd
import openpyxl
from openpyxl import load_workbook
import glob
import time

#Path to folder
path_dir: str = r"C:\Users\Moench\Desktop\r2d2\EPUB\content1\*.xhtml"

#Read files
for filename in glob.glob(path_dir):

#Assign the table data to a Pandas dataframe 
        dfs = open(filename,'r')
        dfs1 = pd.read_html(dfs)

#Read data                                                                                                         
        df2 = dfs1[0][['Unnamed: 0_level_0','Unnamed: 1_level_0','Unnamed: 2_level_0','Unnamed: 3_level_0','Unnamed: 4_level_0','Unnamed: 12_level_0','Unnamed: 13_level_0']]
 
#Print result (Looks like that it goes through all files in the folder)
#        print (df2)

# Write to existing Excel-Sheet
book = load_workbook('output.xlsx')
writer = pd.ExcelWriter('output.xlsx',engine='openpyxl') 
writer.book = book
ts = time.time()

df3 = df2.append(df2) 
writer.sheets = dict((ws.title,ws) for ws in book.worksheets)
df3.to_excel(writer,str(ts))
writer.save()

解决方法

您将每次迭代的数据存储在同一数据帧中,并在每次迭代时将其重写,因此您只有最后一个数据(实际上是两次,因为df2.append(df2)

这里是经过稍微修改的版本,将每个数据帧存储在df_list中,并使用此列表上的pd.concat创建df3

#!/usr/bin/python3

# Import libaries
import pandas as pd
import openpyxl
from openpyxl import load_workbook
import glob
import time

#Path to folder
path_dir: str = r"C:\Users\Moench\Desktop\r2d2\EPUB\content1\*.xhtml"

# Initiate list of dataframes
df_list = list()

#Read files
for filename in glob.glob(path_dir):

#Assign the table data to a Pandas dataframe 
        dfs = open(filename,'r')
        dfs1 = pd.read_html(dfs)

#Read data                                                                                                         
        df2 = dfs1[0][['Unnamed: 0_level_0','Unnamed: 1_level_0','Unnamed: 2_level_0','Unnamed: 3_level_0','Unnamed: 4_level_0','Unnamed: 12_level_0','Unnamed: 13_level_0']]
       
        df_list.append(df2)
 
#Print result (Looks like that it goes through all files in the folder)
#        print (df2)

# Write to existing Excel-Sheet
book = load_workbook('output.xlsx')
writer = pd.ExcelWriter('output.xlsx',engine='openpyxl') 
writer.book = book
ts = time.time()

# Concatenate all dataframes into one
df3 = pd.concat(df_list,ignore_index=True) 

writer.sheets = dict((ws.title,ws) for ws in book.worksheets)
df3.to_excel(writer,str(ts))
writer.save()

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。