前言:
這幾天要來寫個股票回測,發現太久沒寫程式,
都忘了怎麼處理CSV開檔寫檔與儲存至SQLite這部分。
翻翻以前的紀錄發現不太懂當時的寫法...今天就來複習一下!
範例檔案格式如下:
STOCK_CODE,OPEN,LOW,HIGH,CLOSE,VOLUME
2GO,10.56,10.0,10.56,10.1,68600.0
AB,10.66,10.66,10.7,10.7,500.0
ABA,0.54,0.52,0.56,0.53,10807000.0
...以下省略
以下的程式碼是用Jupyter執行,直接輸出md檔
import json
import os
import pandas as pd
import sqlite3
#單日交易檔案
filename='2020-05-12.csv'
df_example = pd.read_csv(filename)
# STOCK_CODE(symbol) OPEN,LOW,HIGH,CLOSE,VOLUME
#df_example.columns = (['股票代號','開盤價','最低價','最高價','收盤價','交易量'])
#從檔名擷取出需要的字串
#檔名 2020-05-12.csv 取 2020-05-12
filedate=filename.split('.')[0]
print(filedate)
2020-05-12
#印出前五行的內容
df_example[:5]
STOCK_CODE | OPEN | LOW | HIGH | CLOSE | VOLUME | |
---|---|---|---|---|---|---|
0 | 2GO | 10.56 | 10.00 | 10.56 | 10.10 | 68600.0 |
1 | AB | 10.66 | 10.66 | 10.70 | 10.70 | 500.0 |
2 | ABA | 0.54 | 0.52 | 0.56 | 0.53 | 10807000.0 |
3 | ABG | 8.00 | 8.00 | 8.47 | 8.02 | 16300.0 |
4 | AC | 700.00 | 690.00 | 705.00 | 704.00 | 483420.0 |
#新增欄位: DATE filedate就是上面取出的日期
df_example['DATE']=filedate
#把 STOCK_CODE 改成 SYMBOL
df_example.rename(columns={'STOCK_CODE':'SYMBOL'}, inplace=True)
df_example[:5]
SYMBOL | OPEN | LOW | HIGH | CLOSE | VOLUME | DATE | |
---|---|---|---|---|---|---|---|
0 | 2GO | 10.56 | 10.00 | 10.56 | 10.10 | 68600.0 | 2020-05-12 |
1 | AB | 10.66 | 10.66 | 10.70 | 10.70 | 500.0 | 2020-05-12 |
2 | ABA | 0.54 | 0.52 | 0.56 | 0.53 | 10807000.0 | 2020-05-12 |
3 | ABG | 8.00 | 8.00 | 8.47 | 8.02 | 16300.0 | 2020-05-12 |
4 | AC | 700.00 | 690.00 | 705.00 | 704.00 | 483420.0 | 2020-05-12 |
dp=df_example
#自訂欄位排序 解法 https://bit.ly/3dCkXeH
#把 DATE 欄位移到最前面
dp = dp.reindex(columns=['DATE','SYMBOL','OPEN','LOW','HIGH','CLOSE','VOLUME'])
#將 DATE 欄位用 pandas 轉成日期的資料型態
dp['DATE']=pd.to_datetime(dp['DATE'])
#查看欄位型態
dp.info()
#dp.dtypes
RangeIndex: 195 entries, 0 to 194
Data columns (total 7 columns):
DATE 195 non-null datetime64[ns]
SYMBOL 195 non-null object
OPEN 195 non-null float64
LOW 195 non-null float64
HIGH 195 non-null float64
CLOSE 195 non-null float64
VOLUME 195 non-null float64
dtypes: datetime64[ns](1), float64(5), object(1)
memory usage: 10.0+ KB
dbname='test.db'
db = sqlite3.connect(dbname)
dp.to_sql('summy',db,if_exists='replace')
with sqlite3.connect(dbname) as db:
df=pd.read_sql_query('select * from "summy"', con=db)
df[:10]
index | DATE | SYMBOL | OPEN | LOW | HIGH | CLOSE | VOLUME | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 2020-05-12 00:00:00 | 2GO | 10.56 | 10.00 | 10.56 | 10.10 | 68600.0 |
1 | 1 | 2020-05-12 00:00:00 | AB | 10.66 | 10.66 | 10.70 | 10.70 | 500.0 |
2 | 2 | 2020-05-12 00:00:00 | ABA | 0.54 | 0.52 | 0.56 | 0.53 | 10807000.0 |
3 | 3 | 2020-05-12 00:00:00 | ABG | 8.00 | 8.00 | 8.47 | 8.02 | 16300.0 |
4 | 4 | 2020-05-12 00:00:00 | AC | 700.00 | 690.00 | 705.00 | 704.00 | 483420.0 |
5 | 5 | 2020-05-12 00:00:00 | ACEPH | 2.33 | 2.26 | 2.34 | 2.26 | 7453000.0 |
6 | 6 | 2020-05-12 00:00:00 | ACEX | 6.71 | 6.60 | 6.82 | 6.79 | 129200.0 |
7 | 7 | 2020-05-12 00:00:00 | ACR | 0.94 | 0.94 | 0.94 | 0.94 | 1000.0 |
8 | 8 | 2020-05-12 00:00:00 | AEV | 40.90 | 40.15 | 41.35 | 41.05 | 342500.0 |
9 | 9 | 2020-05-12 00:00:00 | AGI | 6.31 | 6.20 | 6.31 | 6.20 | 2038000.0 |
以上就是這次的複習內容 🙂
ps:最近開始用markdown寫文件
發現程式碼會跟 Crayon Syntax Highlighter 相衝
會有 " 的問題...本來是不想理會的...
後來還是把這支外掛停用...