如何解决无法导出到“.csv”文件 - pandas.DataFrame
我想就我的 Google Colaboratory Notebook 寻求帮助。错误位于第四个单元格。
上下文:
我们正在执行网络抓取 BTC 的历史数据。
这是我的代码:
第一个单元格 (成功执行)
#importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd
第二个单元格 (成功执行)
#sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
#request the page
page = requests.get(url)
#creating a soup object and the parser
soup = BeautifulSoup(page.text,'lxml')
#creating a table body to pass on the soup to find the table
table_body = soup.find('table')
#creating an empty list to store information
row_data = []
#creating a table
for row in table_body.find_all('tr'):
col = row.find_all('td')
col = [ele.text.strip() for ele in col ] # stripping the whitespaces
row_data.append(col) #append the column
# extracting all data on table entries
df = pd.DataFrame(row_data)
df
第三个单元 (成功执行)
headers = []
for i in soup.find_all('th'):
col_name = i.text.strip().lower().replace(" ","_")
headers.append(col_name)
headers
第四个单元格 (执行失败)
df = pd.DataFrame(row_data,columns=headers)
df
#into a file
df.to_csv('/content/file.csv')
错误! :(
AssertionError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data,columns,coerce_float,dtype)
563 try:
--> 564 columns = _validate_or_indexify_columns(content,columns)
565 result = _convert_object_array(content,dtype=dtype,coerce_float=coerce_float)
AssertionError: 13 columns passed,passed data had 7 columns
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data,dtype)
565 result = _convert_object_array(content,coerce_float=coerce_float)
566 except AssertionError as e:
--> 567 raise ValueError(e) from e
568 return result,columns
569
ValueError: 13 columns passed,passed data had 7 columns
解决方法
要加载表,您可以使用简单的 pd.read_html()
。例如:
import pandas as pd
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
df = pd.read_html(url)[0]
print(df)
df.to_csv("data.csv")
创建 data.csv
(来自 LibreOffice 的屏幕截图):
纠正你的例子:
# importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd
# sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
# request the page
page = requests.get(url)
# creating a soup object and the parser
soup = BeautifulSoup(page.text,"lxml")
# creating a table body to pass on the soup to find the table
table_body = soup.find("table")
# creating an empty list to store information
row_data = []
# creating a table
for row in table_body.select("tr:has(td)"):
col = row.find_all("td")
col = [ele.text.strip() for ele in col] # stripping the whitespaces
row_data.append(col) # append the column
# extracting all data on table entries
df = pd.DataFrame(row_data)
headers = []
for i in table_body.select("th"):
col_name = i.text.strip().lower().replace(" ","_")
headers.append(col_name)
df = pd.DataFrame(row_data,columns=headers)
print(df)
df.to_csv("/content/file.csv")
,
import pandas as pd
df = pd.read_json(
'https://www.bitrates.com/api/node/v1/symbols/USDTUSD/bitrates/series?aggregate=3&period=lastMonth').T['series'].to_dict()['data']
print(pd.DataFrame(df))
输出:
date open close ... supply market_volume24 btc_ratio
0 2021-04-11T06:00:00.000Z 0.999212 0.999114 ... 4.584629e+10 3.146109e+08 0.000016
1 2021-04-12T00:00:00.000Z 0.999114 0.999317 ... 4.584629e+10 2.100706e+09 0.000016
2 2021-06-04T18:00:00.000Z 0.999317 1.000613 ... 6.447629e+10 7.298208e+08 0.000025
3 2021-06-05T12:00:00.000Z 1.000613 1.000328 ... 0.000000e+00 6.502947e+09 0.000025
4 2021-06-06T06:00:00.000Z 1.000328 1.000499 ... 6.447629e+10 6.649574e+08 0.000025
5 2021-06-07T00:00:00.000Z 1.000499 1.000408 ... 6.447629e+10 8.272473e+09 0.000025
6 2021-06-07T18:00:00.000Z 1.000408 1.000338 ... 6.447629e+10 1.090599e+09 0.000025
7 2021-06-08T12:00:00.000Z 1.000338 1.000840 ... 6.447177e+10 2.196249e+09 0.000028
8 2021-06-09T06:00:00.000Z 1.000840 1.001088 ... 0.000000e+00 1.080053e+10 0.000028
9 2021-06-10T00:00:00.000Z 1.001088 1.000618 ... 6.447177e+10 4.158914e+09 0.000026
10 2021-06-10T18:00:00.000Z 1.000618 1.000436 ... 6.447177e+10 6.713012e+08 0.000026
11 2021-06-11T12:00:00.000Z 1.000436 1.000234 ... 6.447177e+10 4.093096e+09 0.000025
12 2021-06-12T06:00:00.000Z 1.000234 1.000385 ... 6.447177e+10 5.042653e+09 0.000026
13 2021-06-13T00:00:00.000Z 1.000385 1.000302 ... 0.000000e+00 5.502808e+09 0.000026
14 2021-06-13T18:00:00.000Z 1.000302 1.000110 ... 6.447177e+10 1.008952e+10 0.000024
15 2021-06-14T12:00:00.000Z 1.000110 1.000309 ... 6.447177e+10 7.405940e+09 0.000024
16 2021-06-15T06:00:00.000Z 1.000309 1.000205 ... 6.447177e+10 4.256491e+09 0.000023
17 2021-06-16T00:00:00.000Z 1.000205 1.000104 ... 0.000000e+00 1.495518e+09 0.000023
18 2021-06-16T18:00:00.000Z 1.000104 0.999833 ... 0.000000e+00 3.033091e+09 0.000024
19 2021-06-17T12:00:00.000Z 0.999833 1.000016 ... 6.447177e+10 1.449031e+08 0.000024
20 2021-07-10T00:00:00.000Z 1.000016 1.000100 ... 6.446977e+10 7.586923e+08 0.000025
21 2021-07-10T18:00:00.000Z 1.000100 1.000199 ... 6.446977e+10 2.312489e+09 0.000025
22 2021-07-11T12:00:00.000Z 1.000199 1.000134 ... 6.446977e+10 2.236517e+09 0.000024
23 2021-07-12T06:00:00.000Z 1.000134 1.000192 ... 6.446977e+10 8.140557e+09 0.000024
24 2021-07-13T00:00:00.000Z 1.000192 1.000290 ... 6.446977e+10 3.846952e+09 0.000026
25 2021-07-13T18:00:00.000Z 1.000290 1.000411 ... 6.446977e+10 1.278604e+09 0.000026
26 2021-07-14T12:00:00.000Z 1.000411 1.000315 ... 6.446977e+10 3.279535e+09 0.000026
27 2021-07-15T06:00:00.000Z 1.000315 1.000142 ... 6.446977e+10 8.086642e+08 0.000026
28 2021-07-16T00:00:00.000Z 1.000142 1.000295 ... 6.446977e+10 1.187211e+09 0.000027
29 2021-07-16T18:00:00.000Z 1.000295 1.000610 ... 6.446977e+10 7.721854e+08 0.000027
30 2021-07-17T12:00:00.000Z 1.000610 1.000535 ... 6.446977e+10 4.535049e+09 0.000027
31 2021-07-18T06:00:00.000Z 1.000535 1.000610 ... 6.446977e+10 2.345491e+09 0.000026
32 2021-07-19T00:00:00.000Z 1.000610 1.000386 ... 6.446977e+10 4.725531e+09 0.000027
33 2021-07-19T18:00:00.000Z 1.000386 1.000215 ... 6.446977e+10 3.314499e+09 0.000028
34 2021-07-20T12:00:00.000Z 1.000215 1.000324 ... 6.446977e+10 5.315525e+09 0.000030
35 2021-07-21T06:00:00.000Z 1.000324 1.000277 ... 6.446977e+10 7.141479e+09 0.000028
36 2021-07-22T00:00:00.000Z 1.000277 1.000255 ... 6.446977e+10 2.533840e+09 0.000028
37 2021-07-22T18:00:00.000Z 1.000255 1.000325 ... 6.446977e+10 2.699050e+09 0.000027
38 2021-07-23T12:00:00.000Z 1.000325 1.000363 ... 6.446977e+10 2.681340e+09 0.000026
39 2021-07-24T06:00:00.000Z 1.000363 1.000644 ... 6.446974e+10 6.241232e+08 0.000026
[40 rows x 10 columns]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。