[心得] 複習CSV檔案的處理與匯入資料庫

前言:

這幾天要來寫個股票回測，發現太久沒寫程式，
都忘了怎麼處理CSV開檔寫檔與儲存至SQLite這部分。
翻翻以前的紀錄發現不太懂當時的寫法...今天就來複習一下!

範例檔案格式如下:

STOCK_CODE,OPEN,LOW,HIGH,CLOSE,VOLUME
2GO,10.56,10.0,10.56,10.1,68600.0
AB,10.66,10.66,10.7,10.7,500.0 
ABA,0.54,0.52,0.56,0.53,10807000.0   
...以下省略

以下的程式碼是用Jupyter執行，直接輸出md檔

import json
import os
import pandas as pd
import sqlite3

#單日交易檔案
filename='2020-05-12.csv'
df_example = pd.read_csv(filename) 
# STOCK_CODE(symbol) OPEN,LOW,HIGH,CLOSE,VOLUME
#df_example.columns = (['股票代號','開盤價','最低價','最高價','收盤價','交易量'])

#從檔名擷取出需要的字串
#檔名 2020-05-12.csv 取 2020-05-12
filedate=filename.split('.')[0]
print(filedate)

2020-05-12

#印出前五行的內容
df_example[:5]

	STOCK_CODE	OPEN	LOW	HIGH	CLOSE	VOLUME
0	2GO	10.56	10.00	10.56	10.10	68600.0
1	AB	10.66	10.66	10.70	10.70	500.0
2	ABA	0.54	0.52	0.56	0.53	10807000.0
3	ABG	8.00	8.00	8.47	8.02	16300.0
4	AC	700.00	690.00	705.00	704.00	483420.0

#新增欄位: DATE filedate就是上面取出的日期
df_example['DATE']=filedate

#把 STOCK_CODE 改成 SYMBOL
df_example.rename(columns={'STOCK_CODE':'SYMBOL'}, inplace=True)

df_example[:5]

	SYMBOL	OPEN	LOW	HIGH	CLOSE	VOLUME	DATE
0	2GO	10.56	10.00	10.56	10.10	68600.0	2020-05-12
1	AB	10.66	10.66	10.70	10.70	500.0	2020-05-12
2	ABA	0.54	0.52	0.56	0.53	10807000.0	2020-05-12
3	ABG	8.00	8.00	8.47	8.02	16300.0	2020-05-12
4	AC	700.00	690.00	705.00	704.00	483420.0	2020-05-12

dp=df_example
#自訂欄位排序 解法 https://bit.ly/3dCkXeH
#把 DATE 欄位移到最前面
dp = dp.reindex(columns=['DATE','SYMBOL','OPEN','LOW','HIGH','CLOSE','VOLUME'])

#將 DATE 欄位用 pandas 轉成日期的資料型態
dp['DATE']=pd.to_datetime(dp['DATE'])

#查看欄位型態
dp.info()
#dp.dtypes


RangeIndex: 195 entries, 0 to 194
Data columns (total 7 columns):
DATE      195 non-null datetime64[ns]
SYMBOL    195 non-null object
OPEN      195 non-null float64
LOW       195 non-null float64
HIGH      195 non-null float64
CLOSE     195 non-null float64
VOLUME    195 non-null float64
dtypes: datetime64[ns](1), float64(5), object(1)
memory usage: 10.0+ KB

dbname='test.db'
db = sqlite3.connect(dbname)

dp.to_sql('summy',db,if_exists='replace')

with sqlite3.connect(dbname) as db:
    df=pd.read_sql_query('select * from "summy"', con=db)
df[:10]

	index	DATE	SYMBOL	OPEN	LOW	HIGH	CLOSE	VOLUME
0	0	2020-05-12 00:00:00	2GO	10.56	10.00	10.56	10.10	68600.0
1	1	2020-05-12 00:00:00	AB	10.66	10.66	10.70	10.70	500.0
2	2	2020-05-12 00:00:00	ABA	0.54	0.52	0.56	0.53	10807000.0
3	3	2020-05-12 00:00:00	ABG	8.00	8.00	8.47	8.02	16300.0
4	4	2020-05-12 00:00:00	AC	700.00	690.00	705.00	704.00	483420.0
5	5	2020-05-12 00:00:00	ACEPH	2.33	2.26	2.34	2.26	7453000.0
6	6	2020-05-12 00:00:00	ACEX	6.71	6.60	6.82	6.79	129200.0
7	7	2020-05-12 00:00:00	ACR	0.94	0.94	0.94	0.94	1000.0
8	8	2020-05-12 00:00:00	AEV	40.90	40.15	41.35	41.05	342500.0
9	9	2020-05-12 00:00:00	AGI	6.31	6.20	6.31	6.20	2038000.0

以上就是這次的複習內容 🙂

ps:最近開始用markdown寫文件
發現程式碼會跟 Crayon Syntax Highlighter 相衝
會有 &quot 的問題...本來是不想理會的...
後來還是把這支外掛停用...

一	二	三	四	五	六	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

前言:

以上就是這次的複習內容 🙂

Share To：

相關