[心得] 複習CSV檔案的處理與匯入資料庫

前言:

這幾天要來寫個股票回測,發現太久沒寫程式,
都忘了怎麼處理CSV開檔寫檔與儲存至SQLite這部分。
翻翻以前的紀錄發現不太懂當時的寫法...今天就來複習一下!

範例檔案格式如下:

STOCK_CODE,OPEN,LOW,HIGH,CLOSE,VOLUME
2GO,10.56,10.0,10.56,10.1,68600.0
AB,10.66,10.66,10.7,10.7,500.0 
ABA,0.54,0.52,0.56,0.53,10807000.0   
...以下省略

以下的程式碼是用Jupyter執行,直接輸出md檔

import json
import os
import pandas as pd
import sqlite3
#單日交易檔案
filename='2020-05-12.csv'
df_example = pd.read_csv(filename) 
# STOCK_CODE(symbol) OPEN,LOW,HIGH,CLOSE,VOLUME
#df_example.columns = (['股票代號','開盤價','最低價','最高價','收盤價','交易量'])
#從檔名擷取出需要的字串
#檔名 2020-05-12.csv 取 2020-05-12
filedate=filename.split('.')[0]
print(filedate)
2020-05-12
#印出前五行的內容
df_example[:5]
STOCK_CODE OPEN LOW HIGH CLOSE VOLUME
0 2GO 10.56 10.00 10.56 10.10 68600.0
1 AB 10.66 10.66 10.70 10.70 500.0
2 ABA 0.54 0.52 0.56 0.53 10807000.0
3 ABG 8.00 8.00 8.47 8.02 16300.0
4 AC 700.00 690.00 705.00 704.00 483420.0
#新增欄位: DATE filedate就是上面取出的日期
df_example['DATE']=filedate
#把 STOCK_CODE 改成 SYMBOL
df_example.rename(columns={'STOCK_CODE':'SYMBOL'}, inplace=True)
df_example[:5]
SYMBOL OPEN LOW HIGH CLOSE VOLUME DATE
0 2GO 10.56 10.00 10.56 10.10 68600.0 2020-05-12
1 AB 10.66 10.66 10.70 10.70 500.0 2020-05-12
2 ABA 0.54 0.52 0.56 0.53 10807000.0 2020-05-12
3 ABG 8.00 8.00 8.47 8.02 16300.0 2020-05-12
4 AC 700.00 690.00 705.00 704.00 483420.0 2020-05-12
dp=df_example
#自訂欄位排序 解法 https://bit.ly/3dCkXeH
#把 DATE 欄位移到最前面
dp = dp.reindex(columns=['DATE','SYMBOL','OPEN','LOW','HIGH','CLOSE','VOLUME'])
#將 DATE 欄位用 pandas 轉成日期的資料型態
dp['DATE']=pd.to_datetime(dp['DATE'])
#查看欄位型態
dp.info()
#dp.dtypes

RangeIndex: 195 entries, 0 to 194
Data columns (total 7 columns):
DATE      195 non-null datetime64[ns]
SYMBOL    195 non-null object
OPEN      195 non-null float64
LOW       195 non-null float64
HIGH      195 non-null float64
CLOSE     195 non-null float64
VOLUME    195 non-null float64
dtypes: datetime64[ns](1), float64(5), object(1)
memory usage: 10.0+ KB
dbname='test.db'
db = sqlite3.connect(dbname)
dp.to_sql('summy',db,if_exists='replace')
with sqlite3.connect(dbname) as db:
    df=pd.read_sql_query('select * from "summy"', con=db)
df[:10]
index DATE SYMBOL OPEN LOW HIGH CLOSE VOLUME
0 0 2020-05-12 00:00:00 2GO 10.56 10.00 10.56 10.10 68600.0
1 1 2020-05-12 00:00:00 AB 10.66 10.66 10.70 10.70 500.0
2 2 2020-05-12 00:00:00 ABA 0.54 0.52 0.56 0.53 10807000.0
3 3 2020-05-12 00:00:00 ABG 8.00 8.00 8.47 8.02 16300.0
4 4 2020-05-12 00:00:00 AC 700.00 690.00 705.00 704.00 483420.0
5 5 2020-05-12 00:00:00 ACEPH 2.33 2.26 2.34 2.26 7453000.0
6 6 2020-05-12 00:00:00 ACEX 6.71 6.60 6.82 6.79 129200.0
7 7 2020-05-12 00:00:00 ACR 0.94 0.94 0.94 0.94 1000.0
8 8 2020-05-12 00:00:00 AEV 40.90 40.15 41.35 41.05 342500.0
9 9 2020-05-12 00:00:00 AGI 6.31 6.20 6.31 6.20 2038000.0

以上就是這次的複習內容 🙂

ps:最近開始用markdown寫文件
發現程式碼會跟 Crayon Syntax Highlighter 相衝
會有 &quot 的問題...本來是不想理會的...
後來還是把這支外掛停用...