微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

清理后将URL的csv数据直接加载到mysql表中

如何解决清理后将URL的csv数据直接加载到mysql表中

我想将URL“ https://www.treasury.gov/ofac/downloads/sdn.csv”中给出的数据直接加载到名为sdn的表中。 我唯一要做的更改是将所有具有该值的列的所有'-0-'替换为'

我尝试使用熊猫来做到这一点,但是我的方法看起来并不干净。

import requests
import pandas as pd


sdnURL = "https://www.treasury.gov/ofac/downloads/sdn.csv"
altURL = "https://www.treasury.gov/ofac/downloads/alt.csv"
addURL = "https://www.treasury.gov/ofac/downloads/add.csv"
sdnCommentsURL = "https://www.treasury.gov/ofac/downloads/sdn_comments.csv"

sdnHeader = ["sdn_id","sdn_name","sdn_type","program","title","call_sign","vessel_type","tonnage","gross_tonnage","vessel_flag","vessel_owner","remarks"]
altHeader = ["sdn_id","alt_id","alt_type","alt_name","remarks"]
addHeader = ["sdn_id","address_id","address","city_state_post","country","remarks"]
sdnCommentsHeader = ["sdn_id","remarks"]


sdn = pd.read_csv(sdnURL,names = sdnHeader,header = None)
alt = pd.read_csv(altURL,names = altHeader,header = None)
add = pd.read_csv(addURL,names = addHeader,header = None)
sdnComments = pd.read_csv(sdnCommentsURL,names = sdnCommentsHeader,header = None)

sdn.to_csv('sdn.csv',index = False)
alt.to_csv('alt.csv',index = False)
add.to_csv('add.csv',index = False)
sdnComments.to_csv('sdnComments.csv',index = False)

我还打算将csv加载到MysqL表中。 我的方法有两个问题-

  1. 我不想为每个文件编写命令。
  2. 一次性替换所有列中的“ -0-”

最终编辑:感谢@Jimmar的回答,我最终最终编写了这样的代码-

import requests
import pandas as pd

files = {
         "sdn" : ["sdn_id","remarks"],"alt" : ["sdn_id","add" : ["sdn_id","sdn_comments" : ["sdn_id","remarks"]
        }

def fetch_csv(file,headers):
    df = pd.read_csv("https://www.treasury.gov/ofac/downloads/"+file+".csv",names=headers,header=None)
    df = df.replace('-0- ','')
    df.to_csv(file+'.csv',index=False)

for file,headers in files.items():
    fetch_csv(file,headers)

解决方法

您可以通过这种方式来组织代码(我只做2个)

import requests
import pandas as pd

def fetch_csv(url,headers,file_name):
    df = pd.read_csv(url,names=headers,header=None)
    df = df.replace('-0- ','')
    df.to_csv(file_name,index=False)

sources = [
    {
       "url": "https://www.treasury.gov/ofac/downloads/sdn.csv","headers": ["sdn_id","sdn_name","sdn_type","program","title","call_sign","vessel_type","tonnage","gross_tonnage","vessel_flag","vessel_owner","remarks"],"file_name": "sdn.csv"
    },{
       "url": "https://www.treasury.gov/ofac/downloads/alt.csv","headers":  ["sdn_id","alt_id","alt_type","alt_name","file_name": "alt.csv"
    } # add the rest in the same pattern
]

for source in sources:
    fetch_csv(source['url'],source['headers'],source['file_name'])

如果需要将其写入数据库,则应将df.to_csv行替换为to_sql

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?