在目录中的所有文件上运行python脚本

如何解决在目录中的所有文件上运行python脚本

第一次在这里发布问题，希望有经验/尝试过的人可以分享您的见解...在过去的几天和晚上，我一直在努力做到这一点...现在我无处可去将此脚本循环到目录中的每个文件上。

基本上，这两个脚本可以很好地工作，可以带来pdf文件并将其更改为excel工作簿。现在，我需要浏览选定目录中的所有文件并执行相同的工作。

我总是在打开文件阶段时卡住-是说不能调用数据（pdf页面-data [0]）吗？还是我应该添加更多阶段以将数据集引入...？

我是否必须为数据集创建一个列表，以便我可以调用数据，因为您将要调用的数据不止于此。.这就是为什么python可以读取数据[0]的原因

修订脚本

# import 
import os
import glob
import pdftotext
import openpyxl
from pathlib import Path
from string import ascii_uppercase

# open a pdf file
def to_excel(pdf_file):
    with open(pdf_file,'rb') as f: 
        data = pdftotext.PDF(f)
        
# operate data to get titles,values 
datas = data[0].split('\r\n')

finalData = list()
for item in datas:
    if item != '':
        finalData.append(item)

finalDataRefined = list()
for item in finalData:
    if item != '                          BCA Scheduled Maintenance Questions' and item != ' Do you SUSPECT there is Asbestos at the property?' and item != '    Yes' and item != '    No' and item != '\x0c':
        finalDataRefined.append(item.strip())

titles = list()
values = list()

for num,item in enumerate(finalDataRefined):
    if num % 2 == 0:
        titles.append(item)
    else:
        values.append(item)

# get an output file name
       
OPRAST = values[1]
filename = work_dir / f"{OPRAST}.xlxs"

# create an excel workbook
excel_file = openpyxl.Workbook()
excel_sheet = excel_file.active

excel_sheet.append([])

alphaList = list(ascii_uppercase)
for alphabet in alphaList:
    excel_sheet.column_dimensions[alphabet].width = 20

excel_sheet.append(titles)
excel_sheet.append(values)

# save the excel workbook
excel_file.save(filename)
excel_file.close

# run a python script every file in a directory
alphaList = list(ascii_uppercase)

work_dir = Path(r"C:\Users\Sunny Kim\Downloads\Do Forms")
for pdf_file in work_dir.glob("*.pdf"):
    to_excel(pdf_file)

解决方法

我基本上知道您想做什么，但是您代码的缩进不太易读...尤其是python。

您的目标是为目录dir中的每个pdf文件创建一个Excel？或将所有pdf文件汇总到一个excel文件中？

遵循编码是第一个目标。

代码逻辑。

获取所有pdf文件
遍历所有pdf文件，每个文件：
1. 打开pdf文件
2. 一些操作
3. 导出到excel文件

您的完整代码可能是这样（只是猜测）：

# ----------------import part-------------------
import os
import glob
import pdftotext
import openpyxl
from string import ascii_uppercase
from pathlib import Path

def to_excel(pdf_file):
    with open(pdf_file,'rb') as f: # this open the pdf file
        data = pdftotext.PDF(f)
    # ---------------operate the data,get title and value-----------
    datas = data[0].split('\r\n')

    finalData = list()
    for item in datas:
        if item != '':
            finalData.append(item)

    finalDataRefined = list()
    for item in finalData:
        if item != '                          BCA Scheduled Maintenance Questions' and item != ' Do you suspect there is Asbestos at the property?' and item != '    Yes' and item != '    No' and item != '\x0c':
            finalDataRefined.append(item.strip())

    titles = list()
    values = list()
    for num,item in enumerate(finalDataRefined):
        if num % 2 == 0:
            titles.append(item)
        else:
            values.append(item)

    # ------------------get output file name---------------------
    OPRAST = values[1]
    filename = work_dir / f"{OPRAST}.xlxs"
    # ------------------create excel file sheet------------------
    excel_file = openpyxl.Workbook()
    excel_sheet = excel_file.active

    excel_sheet.append([])

    alphaList = list(ascii_uppercase)
    for alphabet in alphaList:
        excel_sheet.column_dimensions[alphabet].width = 20

    excel_sheet.append(titles)
    excel_sheet.append(values)
    # --------------------save----------------
    excel_file.save(filename)
    excel_file.close
# -------------------main program---------------
alphaList = list(ascii_uppercase)
work_dir = Path(r"C:\Users\Sunny Kim\Downloads\Do Forms")

for pdf_file in work_dir.glob("*.pdf"):
    to_excel(pdf_file)