主題

AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d35 智慧醫(yī)療 (1) 醫(yī)療生醫(yī)大數(shù)據(jù)

傳說中的巴哈魔法師~ | 2022-04-30 12:23:49 | 巴幣 0 | 人氣 528

區(qū)塊鏈技術(shù) -> 儲存電子病歷

邊緣運算 -> 前處理

雲(yún)端運算 -> 深度學(xué)習(xí)

智慧醫(yī)療資料集:

心臟病資料集

-根據(jù)13項指標(biāo)判斷是否有心臟病

監(jiān)督式學(xué)習(xí)->二元分類

中風(fēng)資料集

-根據(jù)10巷指標(biāo)判斷是否有中風(fēng)傾向

非平衡資料集、

監(jiān)督式學(xué)習(xí)->二元分類

非監(jiān)督式學(xué)習(xí)->異常檢測outlier detection

醫(yī)療保險詐欺資料集

-4份訓(xùn)練+4份測試資料表格文件

非平衡資料集、

監(jiān)督式學(xué)習(xí)->二元分類

非監(jiān)督式學(xué)習(xí)->異常檢測

醫(yī)療費用預(yù)測資料集

-根據(jù)6項指標(biāo)預(yù)測醫(yī)療費用

監(jiān)督式學(xué)習(xí)->回歸

藥品使用反饋資料集

-根據(jù)7個欄位進行藥品使用分析

自然語言處理nlp、資料挖掘data mining、字詞編碼word embedding

新冠肺炎全球趨勢資料集

-6個資料文件含確診治癒死亡相關(guān)數(shù)據(jù)

時間序列time series、相關(guān)統(tǒng)計分析、機器學(xué)習(xí)/深度學(xué)習(xí)

胸部X光肺炎資料集

-開發(fā)圖項分類檢測模型-根據(jù)胸部X光片診斷病患

-訓(xùn)練驗證測試資料三組文件、圖片格式j(luò)peg

深度學(xué)習(xí)、圖像分類

新冠肺炎電腦斷層掃描資料集

-開發(fā)圖像分割檢測模型-根據(jù)斷層掃描圖像產(chǎn)生肺部圖項分割+感染部位圖像分割

-20個病患電腦斷層掃描包含肺部/感染部位/肺部及感染部位三種分割標(biāo)記圖片格式nii

醫(yī)療影像NII文件讀取、深度學(xué)習(xí)、圖像分割

心音檢測資料集

-開發(fā)音頻分割檢測模型-判斷第一心音、第二心音、音頻分類模型判斷心臟疾病類型

-兩組音頻資料來自app與數(shù)字聽診器音頻格式wav

深度學(xué)習(xí)、音頻分割、音頻分類

藥物交互作用資料集

-網(wǎng)路圖節(jié)點代表藥物連接代表交互作用涵蓋1512種藥物的48514個交互作用

網(wǎng)路分析network analysis、節(jié)點編碼node embedding、連接預(yù)測link prediction、異常分析

實作

困惑學(xué)生腦波資料集

依照種族、姓別、年齡及測量到的DELTA, THETA, ALPHA, BETA, GAMMA

建立模型判斷學(xué)生是否處於困惑狀態(tài)

pip install tensorflow pandas matplotlib

pip install scikit-learn pywavelets seaborn pandas_profiling pyqt5

版本問題參考 AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d28 深度學(xué)習(xí) (4) 預(yù)訓(xùn)練模型與轉(zhuǎn)移學(xué)習(xí)

tensorflow 2.8.0

numpy 1.19.5

導(dǎo)入要用的套件:

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns

from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix #混淆矩陣
from sklearn.metrics import classification_report # 顯示主要分類指標(biāo)的文本報告

from tensorflow import keras
import tensorflow.compat.v1 as tf
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.layers import Dense, Activation, Flatten, concatenate, Input, Dropout, LSTM, Bidirectional,BatchNormalization,PReLU,ReLU,Reshape
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

導(dǎo)入資料:

rename 更名

merge 合併

cf. AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d22 資料視覺化 (1) 資料分析

ndarray concatenate/vstack/hstack 串接 split/vsplit/hsplit 分割

data = pd.read_csv("./archive/EEG_data.csv")
demo_data = pd.read_csv('./archive/demographic_info.csv')

demo_data = demo_data.rename(columns = {'subject ID': 'SubjectID'}) #更換特定columns名稱
data = data.merge(demo_data,how = 'inner',on = 'SubjectID') #合併
#除了原本14個columns 新增3個

One hot encoding:

get_dummies 進行One Hot Encode

gender欄位直接分成兩行

data = pd.get_dummies(data)

cf. AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d27-2 深度學(xué)習(xí) (3-2) 遞歸神經(jīng)網(wǎng)路、長短期記憶網(wǎng)路程式碼實作

from tensorflow.keras.utils import to_categorical

to_categorical

cf. AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d29-3 深度學(xué)習(xí) (5-3) 深度學(xué)習(xí)專題實作: 動物辨識

tf.one_hot

label = tf.one_hot(label, NUM_CLASSES)

詳細的數(shù)據(jù)報告:

pip install pandas-profiling

ProfileReport 自動生成詳細的數(shù)據(jù)報告

Overview概觀 Alerts關(guān)係 Reprodiction資料表產(chǎn)生與結(jié)束分析時間、分析時間、軟體版本

-Overview-Data statistics:

-Number of variavles 欄位量

-Number of observations 資料量

-Missiong cells 缺值量

-Duplicate rows 重複資料量

-Average record size in memory 平均每筆紀(jì)錄佔多少空間

-Overview-Variable types: 欄位型態(tài)

-Numeric 數(shù)值的

-Categorical 類別的

Variables: 依照資料表所有欄位生成統(tǒng)計屬性

-某欄位

-Distinct 相異的值數(shù)量

-Missing 遺失的值數(shù)量

-Infinite 無限的值數(shù)量

-Mean 平均的值數(shù)量

-Minimum

-Maximun

-Zeros

-Nagative

-Memory

Interactions: 互動選單看指定兩個欄位之間關(guān)係 (圖表)

Correlations: 所有屬性之間關(guān)聯(lián)度 (圖表)

Missing values: 遺失資料

Sample: 查看資料集前面與後面

import pandas_profiling as pp
pp.ProfileReport(data)

繪製關(guān)係熱力圖:

類似剛剛的Correlations

pip install seaborn

import seaborn as sns

熱力圖 heatmap

annot=True 代表在圖中顯示出數(shù)值

plt.figure(figsize = (15,15))
cor_matrix = data.corr()
sns.heatmap(cor_matrix,annot=True)

刪除不需要訓(xùn)練的欄位:

AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d22 資料視覺化 (1) 資料分析

drop 去除你不要的 inplace=True回傳本身

刪除(SubjectID VideoID predefinedlabel) 保留(Attention Mediation Raw) 以及腦波的重要訊號(Delta Theta Alpha1 Alpha2 Beta1 Beta2 Gamma1 Gamma2) 對於自身混換程度評分1~7(user-definedlabeln) 年齡種族性別

data.drop(columns = ['SubjectID','VideoID','predefinedlabel'],inplace=True)

產(chǎn)生輸入與輸出:

pop彈出並推入其他 DataFrame

y= data.pop('user-definedlabeln')
x= data

繪製折線圖 (用長條圖應(yīng)該比較適合) (應(yīng)該顯示其中10筆資料就好了) :

x.iloc[:1000,:11].plot(figsize = (15,10)) #0~999筆資料顯示第0~10欄位
#Attention Mediation Raw 是甚麼東西為甚麼也要進到表中

StandardScaler正規(guī)化:

AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d28 深度學(xué)習(xí) (4) 預(yù)訓(xùn)練模型與轉(zhuǎn)移學(xué)習(xí)

from sklearn.preprocessing import StandardScaler

StandardScaler正規(guī)化

fit(重新)建模

transform(依照之前的模)轉(zhuǎn)換

fit_transform(重新)建模並轉(zhuǎn)換

x = StandardScaler().fit_transform(x)

設(shè)定訓(xùn)練集train與測試集test比例:

AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d17~18 資料探勘 (3) 資料分類與預(yù)測介紹並實作

from sklearn.model_selection import train_test_split

train_test_split x資料y答案設(shè)定訓(xùn)練集train與測試集test比例 (這邊也就是驗證集= =)

自動切分validation_split 與 "手動切分"train_test_split + validation_data

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.15)

轉(zhuǎn)成深度學(xué)習(xí)的格式 (加上最後一維):

此資料有17個欄位

x_train = np.array(x_train).reshape(-1,17,1)
x_test = np.array(x_test).reshape(-1,17,1)

x_train.shape,x_test.shape,y_train.shape,y_test.shape

((10889, 17, 1), (1922, 17, 1), (10889,), (1922,))

建立模型:

Bidirectional 雙向

inputs = tf.keras.Input(shape=(17,1))

Dense1 = Dense(64, activation = 'relu',kernel_regularizer=keras.regularizers.l2())(inputs)

#Dense2 = Dense(128, activation = 'relu',kernel_regularizer=keras.regularizers.l2())(Dense1)
#Dense3 = Dense(256, activation = 'relu',kernel_regularizer=keras.regularizers.l2())(Dense2)

lstm_1= Bidirectional(LSTM(256, return_sequences = True))(Dense1)
drop = Dropout(0.3)(lstm_1)
lstm_3= Bidirectional(LSTM(128, return_sequences = True))(drop)
drop2 = Dropout(0.3)(lstm_3)
# Bidirectional雙向

flat = Flatten()(drop2)

#Dense_1 = Dense(256, activation = 'relu')(flat)

Dense_2 = Dense(128, activation = 'relu')(flat)
outputs = Dense(1, activation='sigmoid')(Dense_2)

model = tf.keras.Model(inputs, outputs)

model.summary()

Model: "functional_1"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

input_1 (InputLayer) [(None, 17, 1)] 0

_________________________________________________________________

dense (Dense) (None, 17, 64) 128

_________________________________________________________________

bidirectional (Bidirectional (None, 17, 512) 657408

_________________________________________________________________

dropout (Dropout) (None, 17, 512) 0

_________________________________________________________________

bidirectional_1 (Bidirection (None, 17, 256) 656384

_________________________________________________________________

dropout_1 (Dropout) (None, 17, 256) 0

_________________________________________________________________

flatten (Flatten) (None, 4352) 0

_________________________________________________________________

dense_1 (Dense) (None, 128) 557184

_________________________________________________________________

dense_2 (Dense) (None, 1) 129

=================================================================

Total params: 1,871,233

Trainable params: 1,871,233

Non-trainable params: 0

訓(xùn)練模型 (建議改100回合) :

callbacks模組

EarlyStopping自動提前結(jié)束

ModelCheckpoint自動儲存最佳模型

LearningRateScheduler自動調(diào)整學(xué)習(xí)率

def train_model(model,x_train, y_train,x_test,y_test, save_to, epoch = 2):
# 建議改訓(xùn)練100回合

# 學(xué)習(xí)率超參數(shù)調(diào)整
        opt_adam = keras.optimizers.Adam(learning_rate=0.001)
# 訓(xùn)練的callbacks模組中函數(shù)
        es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10) # 驗證val_loss 模式:min越低越好 patience連續(xù)幾回合都往上升才停止
        mc = ModelCheckpoint(save_to + '_best_model.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True) # 驗證val_accuracy 模式max正確率越高越好 save_best_only=True只存最好的False全部存起來 cf.model.save http://www.jamesdambrosio.com/artwork.php?sn=5435549
        lr_schedule = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 0.001 * np.exp(-epoch / 10.)) # 調(diào)整學(xué)習(xí)率前期高後期低
#from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

        model.compile(optimizer=opt_adam,
                  loss=['binary_crossentropy'],
                  metrics=['accuracy'])

        history = model.fit(x_train,y_train,
                        batch_size=20,
                        epochs=epoch,
                        validation_data=(x_test,y_test),
                        callbacks=[es,mc,lr_schedule])##
# validation_split 從訓(xùn)練資料切出多少%當(dāng)作驗證集自動切分X
# train_test_split + validation_data "手動切分"O
# validation_data 驗證集的輸入及輸出

        saved_model = load_model(save_to + '_best_model.h5')

        return model,history

model,history = train_model(model, x_train, y_train,x_test, y_test, save_to= './', epoch = 2)

繪製準(zhǔn)確率accuracy 、損失函數(shù)loss 隨epoch變化圖:

AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d23 資料視覺化 (2) 套件應(yīng)用演練

plt.legend() 搭配label

plt.plot(history.history['accuracy']) # label也可以寫在legend裡面
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left') # upper left左上角
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

繪製混淆矩陣:

x_test->預(yù)測出y_pred

跟真實答案y_test比較

y_test是一個 (1922,) 一維 Series 內(nèi)容為0或1(布林學(xué)生腦波是否混亂)

y_pred是一個 (1922,1) 二維 ndarray 內(nèi)容為0至1間的浮點數(shù)

以0.5為基準(zhǔn) 轉(zhuǎn)為 0或1的布林(依然是1922,1的二維)

實際為真True 實際為假False 預(yù)測為真Positive 預(yù)測為假Negative

from sklearn.metrics import confusion_matrix

confusion_matrix 混淆矩陣

# P N

cf. AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d27-2 深度學(xué)習(xí) (3-2) 遞歸神經(jīng)網(wǎng)路、長短期記憶網(wǎng)路程式碼實作

pd.crosstab

0~9 * 0~9

array([[536, 419],

[354, 613]], dtype=int64)

y_pred =model.predict(x_test)
y_pred = np.array(y_pred >= 0.5, dtype = np.int) # 腦波是否有混亂以0.5為標(biāo)準(zhǔn)
confusion_matrix(y_test, y_pred)

Classification Report:

from sklearn.metrics import classification_report

classification_report 顯示主要分類指標(biāo)的文本報告

-precision精確率(預(yù)測正確率) recall召回率 f1-score F1值 (precision+recall)/(precision*recall)

-support 每個標(biāo)籤出現(xiàn)次數(shù)

-avg/total 各列的平均值（support列為總和）

precision recall f1-score support

0.0 0.64 0.62 0.63 969

1.0 0.63 0.65 0.64 953

accuracy 0.63 1922

macro avg 0.64 0.63 0.63 1922

weighted avg 0.64 0.63 0.63 1922

#深度學(xué)習(xí)#LSTM #Bidirectional #pandas模組 #numpy模組 #tensorflow模組 #keras模組 #matplotlib模組 #sklearn模組 #EarlyStopping

ETH官方钱包

AI跨領(lǐng)域數(shù)據(jù)科學(xué) 課程紀(jì)錄 d35 智慧醫(yī)療 (1) 醫(yī)療生醫(yī)大數(shù)據(jù)

創(chuàng)作回應(yīng)

作者相關(guān)創(chuàng)作

相關(guān)創(chuàng)作

更多創(chuàng)作