主題

AI跨領域數據科學課程紀錄 d35 智慧醫療 (1) 醫療生醫大數據

傳說中的巴哈魔法師~ | 2022-04-30 12:23:49 | 巴幣 0 | 人氣 527

區塊鏈技術 -> 儲存電子病歷

邊緣運算 -> 前處理

雲端運算 -> 深度學習

智慧醫療資料集:

心臟病資料集

-根據13項指標判斷是否有心臟病

監督式學習->二元分類

中風資料集

-根據10巷指標判斷是否有中風傾向

非平衡資料集、

監督式學習->二元分類

非監督式學習->異常檢測outlier detection

醫療保險詐欺資料集

-4份訓練+4份測試資料表格文件

非平衡資料集、

監督式學習->二元分類

非監督式學習->異常檢測

醫療費用預測資料集

-根據6項指標預測醫療費用

監督式學習->回歸

藥品使用反饋資料集

-根據7個欄位進行藥品使用分析

自然語言處理nlp、資料挖掘data mining、字詞編碼word embedding

新冠肺炎全球趨勢資料集

-6個資料文件含確診治癒死亡相關數據

時間序列time series、相關統計分析、機器學習/深度學習

胸部X光肺炎資料集

-開發圖項分類檢測模型-根據胸部X光片診斷病患

-訓練驗證測試資料三組文件、圖片格式jpeg

深度學習、圖像分類

新冠肺炎電腦斷層掃描資料集

-開發圖像分割檢測模型-根據斷層掃描圖像產生肺部圖項分割+感染部位圖像分割

-20個病患電腦斷層掃描包含肺部/感染部位/肺部及感染部位三種分割標記圖片格式nii

醫療影像NII文件讀取、深度學習、圖像分割

心音檢測資料集

-開發音頻分割檢測模型-判斷第一心音、第二心音、音頻分類模型判斷心臟疾病類型

-兩組音頻資料來自app與數字聽診器音頻格式wav

深度學習、音頻分割、音頻分類

藥物交互作用資料集

-網路圖節點代表藥物連接代表交互作用涵蓋1512種藥物的48514個交互作用

網路分析network analysis、節點編碼node embedding、連接預測link prediction、異常分析

實作

困惑學生腦波資料集

依照種族、姓別、年齡及測量到的DELTA, THETA, ALPHA, BETA, GAMMA

建立模型判斷學生是否處於困惑狀態

pip install tensorflow pandas matplotlib

pip install scikit-learn pywavelets seaborn pandas_profiling pyqt5

版本問題參考 AI跨領域數據科學課程紀錄 d28 深度學習 (4) 預訓練模型與轉移學習

tensorflow 2.8.0

numpy 1.19.5

導入要用的套件:

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns

from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix #混淆矩陣
from sklearn.metrics import classification_report # 顯示主要分類指標的文本報告

from tensorflow import keras
import tensorflow.compat.v1 as tf
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.layers import Dense, Activation, Flatten, concatenate, Input, Dropout, LSTM, Bidirectional,BatchNormalization,PReLU,ReLU,Reshape
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

導入資料:

rename 更名

merge 合併

cf. AI跨領域數據科學課程紀錄 d22 資料視覺化 (1) 資料分析

ndarray concatenate/vstack/hstack 串接 split/vsplit/hsplit 分割

data = pd.read_csv("./archive/EEG_data.csv")
demo_data = pd.read_csv('./archive/demographic_info.csv')

demo_data = demo_data.rename(columns = {'subject ID': 'SubjectID'}) #更換特定columns名稱
data = data.merge(demo_data,how = 'inner',on = 'SubjectID') #合併
#除了原本14個columns 新增3個

One hot encoding:

get_dummies 進行One Hot Encode

gender欄位直接分成兩行

data = pd.get_dummies(data)

cf. AI跨領域數據科學課程紀錄 d27-2 深度學習 (3-2) 遞歸神經網路、長短期記憶網路程式碼實作

from tensorflow.keras.utils import to_categorical

to_categorical

cf. AI跨領域數據科學課程紀錄 d29-3 深度學習 (5-3) 深度學習專題實作: 動物辨識

tf.one_hot

label = tf.one_hot(label, NUM_CLASSES)

詳細的數據報告:

pip install pandas-profiling

ProfileReport 自動生成詳細的數據報告

Overview概觀 Alerts關係 Reprodiction資料表產生與結束分析時間、分析時間、軟體版本

-Overview-Data statistics:

-Number of variavles 欄位量

-Number of observations 資料量

-Missiong cells 缺值量

-Duplicate rows 重複資料量

-Average record size in memory 平均每筆紀錄佔多少空間

-Overview-Variable types: 欄位型態

-Numeric 數值的

-Categorical 類別的

Variables: 依照資料表所有欄位生成統計屬性

-某欄位

-Distinct 相異的值數量

-Missing 遺失的值數量

-Infinite 無限的值數量

-Mean 平均的值數量

-Minimum

-Maximun

-Zeros

-Nagative

-Memory

Interactions: 互動選單看指定兩個欄位之間關係 (圖表)

Correlations: 所有屬性之間關聯度 (圖表)

Missing values: 遺失資料

Sample: 查看資料集前面與後面

import pandas_profiling as pp
pp.ProfileReport(data)

繪製關係熱力圖:

類似剛剛的Correlations

pip install seaborn

import seaborn as sns

熱力圖 heatmap

annot=True 代表在圖中顯示出數值

plt.figure(figsize = (15,15))
cor_matrix = data.corr()
sns.heatmap(cor_matrix,annot=True)

刪除不需要訓練的欄位:

AI跨領域數據科學課程紀錄 d22 資料視覺化 (1) 資料分析

drop 去除你不要的 inplace=True回傳本身

刪除(SubjectID VideoID predefinedlabel) 保留(Attention Mediation Raw) 以及腦波的重要訊號(Delta Theta Alpha1 Alpha2 Beta1 Beta2 Gamma1 Gamma2) 對於自身混換程度評分1~7(user-definedlabeln) 年齡種族性別

data.drop(columns = ['SubjectID','VideoID','predefinedlabel'],inplace=True)

產生輸入與輸出:

pop彈出並推入其他 DataFrame

y= data.pop('user-definedlabeln')
x= data

繪製折線圖 (用長條圖應該比較適合) (應該顯示其中10筆資料就好了) :

x.iloc[:1000,:11].plot(figsize = (15,10)) #0~999筆資料顯示第0~10欄位
#Attention Mediation Raw 是甚麼東西為甚麼也要進到表中

StandardScaler正規化:

AI跨領域數據科學課程紀錄 d28 深度學習 (4) 預訓練模型與轉移學習

from sklearn.preprocessing import StandardScaler

StandardScaler正規化

fit(重新)建模

transform(依照之前的模)轉換

fit_transform(重新)建模並轉換

x = StandardScaler().fit_transform(x)

設定訓練集train與測試集test比例:

AI跨領域數據科學課程紀錄 d17~18 資料探勘 (3) 資料分類與預測介紹並實作

from sklearn.model_selection import train_test_split

train_test_split x資料y答案設定訓練集train與測試集test比例 (這邊也就是驗證集= =)

自動切分validation_split 與 "手動切分"train_test_split + validation_data

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.15)

轉成深度學習的格式 (加上最後一維):

此資料有17個欄位

x_train = np.array(x_train).reshape(-1,17,1)
x_test = np.array(x_test).reshape(-1,17,1)

x_train.shape,x_test.shape,y_train.shape,y_test.shape

((10889, 17, 1), (1922, 17, 1), (10889,), (1922,))

建立模型:

Bidirectional 雙向

inputs = tf.keras.Input(shape=(17,1))

Dense1 = Dense(64, activation = 'relu',kernel_regularizer=keras.regularizers.l2())(inputs)

#Dense2 = Dense(128, activation = 'relu',kernel_regularizer=keras.regularizers.l2())(Dense1)
#Dense3 = Dense(256, activation = 'relu',kernel_regularizer=keras.regularizers.l2())(Dense2)

lstm_1= Bidirectional(LSTM(256, return_sequences = True))(Dense1)
drop = Dropout(0.3)(lstm_1)
lstm_3= Bidirectional(LSTM(128, return_sequences = True))(drop)
drop2 = Dropout(0.3)(lstm_3)
# Bidirectional雙向

flat = Flatten()(drop2)

#Dense_1 = Dense(256, activation = 'relu')(flat)

Dense_2 = Dense(128, activation = 'relu')(flat)
outputs = Dense(1, activation='sigmoid')(Dense_2)

model = tf.keras.Model(inputs, outputs)

model.summary()

Model: "functional_1"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

input_1 (InputLayer) [(None, 17, 1)] 0

_________________________________________________________________

dense (Dense) (None, 17, 64) 128

_________________________________________________________________

bidirectional (Bidirectional (None, 17, 512) 657408

_________________________________________________________________

dropout (Dropout) (None, 17, 512) 0

_________________________________________________________________

bidirectional_1 (Bidirection (None, 17, 256) 656384

_________________________________________________________________

dropout_1 (Dropout) (None, 17, 256) 0

_________________________________________________________________

flatten (Flatten) (None, 4352) 0

_________________________________________________________________

dense_1 (Dense) (None, 128) 557184

_________________________________________________________________

dense_2 (Dense) (None, 1) 129

=================================================================

Total params: 1,871,233

Trainable params: 1,871,233

Non-trainable params: 0

訓練模型 (建議改100回合) :

callbacks模組

EarlyStopping自動提前結束

ModelCheckpoint自動儲存最佳模型

LearningRateScheduler自動調整學習率

def train_model(model,x_train, y_train,x_test,y_test, save_to, epoch = 2):
# 建議改訓練100回合

# 學習率超參數調整
        opt_adam = keras.optimizers.Adam(learning_rate=0.001)
# 訓練的callbacks模組中函數
        es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10) # 驗證val_loss 模式:min越低越好 patience連續幾回合都往上升才停止
        mc = ModelCheckpoint(save_to + '_best_model.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True) # 驗證val_accuracy 模式max正確率越高越好 save_best_only=True只存最好的False全部存起來 cf.model.save http://www.jamesdambrosio.com/artwork.php?sn=5435549
        lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 0.001 * np.exp(-epoch / 10.)) # 調整學習率前期高後期低
#from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

        model.compile(optimizer=opt_adam,
                  loss=['binary_crossentropy'],
                  metrics=['accuracy'])

        history = model.fit(x_train,y_train,
                        batch_size=20,
                        epochs=epoch,
                        validation_data=(x_test,y_test),
                        callbacks=[es,mc,lr_schedule])##
# validation_split 從訓練資料切出多少%當作驗證集自動切分X
# train_test_split + validation_data "手動切分"O
# validation_data 驗證集的輸入及輸出

        saved_model = load_model(save_to + '_best_model.h5')

        return model,history

model,history = train_model(model, x_train, y_train,x_test, y_test, save_to= './', epoch = 2)

繪製準確率accuracy 、損失函數loss 隨epoch變化圖:

AI跨領域數據科學課程紀錄 d23 資料視覺化 (2) 套件應用演練

plt.legend() 搭配label

plt.plot(history.history['accuracy']) # label也可以寫在legend裡面
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left') # upper left左上角
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

繪製混淆矩陣:

x_test->預測出y_pred

跟真實答案y_test比較

y_test是一個 (1922,) 一維 Series 內容為0或1(布林學生腦波是否混亂)

y_pred是一個 (1922,1) 二維 ndarray 內容為0至1間的浮點數

以0.5為基準轉為 0或1的布林(依然是1922,1的二維)

實際為真True 實際為假False 預測為真Positive 預測為假Negative

from sklearn.metrics import confusion_matrix

confusion_matrix 混淆矩陣

# P N

cf. AI跨領域數據科學課程紀錄 d27-2 深度學習 (3-2) 遞歸神經網路、長短期記憶網路程式碼實作

pd.crosstab

0~9 * 0~9

array([[536, 419],

[354, 613]], dtype=int64)

y_pred =model.predict(x_test)
y_pred = np.array(y_pred >= 0.5, dtype = np.int) # 腦波是否有混亂以0.5為標準
confusion_matrix(y_test, y_pred)

Classification Report:

from sklearn.metrics import classification_report

classification_report 顯示主要分類指標的文本報告

-precision精確率(預測正確率) recall召回率 f1-score F1值 (precision+recall)/(precision*recall)

-support 每個標籤出現次數

-avg/total 各列的平均值（support列為總和）

precision recall f1-score support

0.0 0.64 0.62 0.63 969

1.0 0.63 0.65 0.64 953

accuracy 0.63 1922

macro avg 0.64 0.63 0.63 1922

weighted avg 0.64 0.63 0.63 1922

#深度學習 #LSTM #Bidirectional #pandas模組 #numpy模組 #tensorflow模組 #keras模組 #matplotlib模組 #sklearn模組 #EarlyStopping

ETH官方钱包

AI跨領域數據科學課程紀錄 d35 智慧醫療 (1) 醫療生醫大數據

創作回應

作者相關創作

相關創作

更多創作

ETH官方钱包

AI跨領域數據科學 課程紀錄 d35 智慧醫療 (1) 醫療生醫大數據

創作回應

作者相關創作

相關創作

更多創作

AI跨領域數據科學課程紀錄 d35 智慧醫療 (1) 醫療生醫大數據