pp Technology for Humanity
Fallacy of Symptom-Based Inference (Diagnosis) and Prediction (Prognosis)

Prediction of the 10-year Cardiovascular Heart Disease Using the Framingham Heart Study Data


Chapter 5. Model Evaluation

In the previous Chapter, we have checked the accuracy of four different models. To recap, the accuracy LogisticRegression was highest (68.35%), followed by SVC (65.82), KNeighbors (56.12%), and DecisionTree (54.01%). Accuracy is one of several measures to check the model's validity. In this Chapter, we assess each model and select the best one for our prediction of 10-year CHD using the Framingham Heart Study data.

1. Logistic Regression

Logistic regression, a fundamental concept in both machine learning and statistics, plays a pivotal role in predicting binary outcomes. In the realm of machine learning, logistic regression is an algorithmic technique that falls under supervised learning. It harnesses the power of mathematical optimization to estimate the probability of an event occurring within a given set of independent variables. By modeling this relationship using a logistic function, it effectively classifies data points into distinct categories based on their features.

In traditional statistics, logistic regression serves as an inferential tool allowing researchers to analyze relationships between predictor variables and categorical response variables. By employing maximum likelihood estimation methods or alternative statistical techniques such as Wald tests or odds ratios, statisticians can interpret coefficients to assess significance levels and make reliable predictions regarding factors influencing an outcome variable's occurrence or absence.

While both disciplines utilize logistic regression for prediction purposes, they may differ in terms of methodologies employed and degree of emphasis placed on interpretability versus predictive accuracy.


We are using the normalized data again. First import libraries


from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score


normalized_df_reg = LogisticRegression().fit(X_train, y_train)
normalized_df_reg_pred = normalized_df_reg.predict(X_test)

# check accuracy: Accuracy: Overall, how often is the classifier correct? Accuracy = (True Pos + True Negative)/total


acc = accuracy_score(y_test, normalized_df_reg_pred)
print(f"The accuracy score for LogReg is: {round(acc,3)*100}%")

F-1 score:
The F-1 score is an essential metric in machine learning that provides a balanced measure of a model's accuracy. It can be interpreted as a weighted average of the precision and recall. It serves as a comprehensive evaluation criterion to assess the effectiveness of classification algorithms. The F-1 score achieves this by harmonizing precision, which represents the proportion of correctly identified positive instances out of all predicted positives, with recall, denoting the ratio of correctly classified positive instances out of all actual positives.

As such, it effectively captures how well a model can detect relevant data points while minimizing false positives or negatives. By striking a balance between these two vital aspects, the F-1 score allows for more informed decisions when comparing and selecting machine learning models across various domains and applications.


f1 = f1_score(y_test, normalized_df_reg_pred)
print(f"The f1 score for Logistic Regression is: {round(f1,3)*100}%")

Output:
The f1 score for Logistic Regression is: 67.0%


Precision score:
When it predicts yes, how often is it correct? Precision=True Positive/predicted yes


precision = precision_score(y_test, normalized_df_reg_pred)
print(f"The precision score for LogReg is: {round(precision,3)*100}%")

Output:
The precision score for Logistic Regression is: 65.0%


Recall score:
True Positive Rate(Sensitivity or Recall): When it’s actually yes, how often does it predict yes? True Positive Rate = True Positive/actual yes


recall = recall_score(y_test, normalized_df_reg_pred)
print(f"The recall score for LogReg is: {round(recall,3)*100}%")

Output:
The recall score for LogReg is: 69.1%


plotting confusion matrix Logistic Regression


cnf_matrix_log = confusion_matrix(y_test, normalized_df_reg_pred)

Confusion matrix graph


fig, ax = plt.subplots()
sns.heatmap(pd.DataFrame(cnf_matrix_log), annot=True,cmap="Reds" , fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.savefig('./static/image/confusion_logistic', bbox_inches='tight')
# plt.title('Confusion matrix Logistic Regression\n', y=1.1)
plt.show()

Figure 1. Confustion matrix

2. Support Vector Machine (SVM)


svm = SVC() #initialize model
svm.fit(X_train, y_train) #fit model
normalized_df_svm_pred = svm.predict(X_test)

Accuracy Score:
Accuracy: Overall, how often is the classifier correct? Accuracy = (True Pos + True Negative)/total


acc = accuracy_score(y_test, normalized_df_svm_pred)
print(f"The accuracy score for SVM is: {round(acc,3)*100}%")

Output:
The accuracy score for SVM is: 65.8%

# f1 score:
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.


f1 = f1_score(y_test, normalized_df_svm_pred)
  print(f"The f1 score for SVM is: {round(f1,3)*100}%") 

Output:
The f1 score for SVM is: 63.7%


# plotting confusion matrix SVM


cnf_matrix_svm = confusion_matrix(y_test, normalized_df_svm_pred)
sns.heatmap(pd.DataFrame(cnf_matrix_svm), annot=True,cmap="Reds" , fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
# plt.title('Confusion matrix SVM\n', y=1.1)
plt.show()
Figure 2. Confusion matrix - SVM
3. Decision Tree

# initialize model


dtc_up = DecisionTreeClassifier()
# fit model
dtc_up.fit(X_train, y_train)
normalized_df_dtc_pred = dtc_up.predict(X_test)

Accuracy
# Overall, how often is the classifier correct? Accuracy = (True Pos + True Negative)/total


acc = accuracy_score(y_test, normalized_df_dtc_pred)
print(f"The accuracy score for DTC is: {round(acc,3)*100}%")

Output:
The accuracy score for DTC is: 53.2%


f1 score
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.


f1 = f1_score(y_test, normalized_df_dtc_pred)
print(f"The f1 score for DTC is: {round(f1,3)*100}%")

Output:
The f1 score for DTC is: 48.8%


Precision score:
When it predicts yes, how often is it correct? Precision=True Positive/predicted yes


precision = precision_score(y_test, normalized_df_dtc_pred)
print(f"The precision score for DTC is: {round(precision, 3)*100}%")

Output:
The precision score for DTC is: 49.5%


recall score:
True Positive Rate(Sensitivity or Recall): When it’s actually yes, how often does it predict yes? True Positive Rate = True Positive/actual yes

Output:
The recall score for DTC is: 48.20%

# plotting confusion matrix Decision Tree


cnf_matrix_dtc = confusion_matrix(y_test, normalized_df_dtc_pred)
sns.heatmap(pd.DataFrame(cnf_matrix_dtc), annot=True,cmap="Reds" , fmt='g')
ax.xaxis.set_label_position("top")
plt.tight_layout()
plt.show()
Figure 3. Confusion matrix Decision Tree

4. KNN Model

knn = KNeighborsClassifier(n_neighbors = 2) # initialize model
knn.fit(X_train, y_train) # fit model
normalized_df_knn_pred = knn.predict(X_test)

Accuracy:
Overall, how often is the classifier correct? Accuracy = (True Pos + True Negative)/total


acc = accuracy_score(y_test, normalized_df_knn_pred)
print(f"The accuracy score for KNN is: {round(acc, 3)*100}%")

Output:
The accuracy score for KNN is: 61.2%


f1 score:
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.


f1 = f1_score(y_test, normalized_df_knn_pred)
print(f"The f1 score for KNN is: {round(f1, 3) * 100}%")

Output:
The f1 score for KNN is: 46.5%


Precision score: When it predicts yes, how often is it correct? Precision=True Positive/predicted yes


precision = precision_score(y_test, normalized_df_knn_pred)
print(f"The precision score for KNN is: {round(precision, 3)*100}%")

Output:
The precision score for KNN is: 64.5%


recall score:
True Positive Rate(Sensitivity or Recall): When it’s actually yes, how often does it predict yes? True Positive Rate = True Positive/actual yes


recall = recall_score(y_test, normalized_df_knn_pred)
print(f"The recall score for KNN is: {round(recall, 3) * 100}%")

Output:
The recall score for KNN is: 36.4%


Result: The KNN model has the highest accuracy score.


# Check overfit of the KNN model
# accuracy test and train


acc_test = knn.score(X_test, y_test)
print("The accuracy score of the test data is: ", acc_test * 100,"%")
acc_train = knn.score(X_train, y_train)
print("The accuracy score of the training data is: ", round(acc_train * 100, 2),"%")

Output: The accuracy score of the test data is: 61.2%
The accuracy score of the training data is: 79.1%


# The scores for test and training data for the KNN model are similar. Therefore, we do not expect the model to overfit.


Perform cross validation

Cross Validation is used to assess the predictive performance of the models and to judge how they perform outside the sample to a new data set.


cv_results = cross_val_score(knn, X, y, cv=5)
print ("Cross-validated scores:", cv_results)
print("The Accuracy of Model with Cross Validation is: {0:.2f}%".format(cv_results.mean() * 100))

Output:
Cross-validated scores: [0.62025316 0.56962025 0.5720339 0.61016949 0.61864407]
The Accuracy of Model with Cross Validation is: 59.81%


# plotting confusion matrix KNN


cnf_matrix_knn = confusion_matrix(y_test, normalized_df_knn_pred)
ax= plt.subplot()
sns.heatmap(pd.DataFrame(cnf_matrix_knn), annot=True,cmap="Reds" , fmt='g')
ax.set_xlabel('Predicted ');ax.set_ylabel('True')
plt.show()
Figure 5. Confusion matrix - KNN


#AU ROC CURVE KNN '''the AUC ROC Curve is a measure of performance based on plotting the true positive and false positive rate and calculating the area under that curve.The closer the score to 1 the better the algorithm's ability to distinguish between the two outcome classes.'''


fpr, tpr, _ = roc_curve(y_test, normalized_df_knn_pred)
auc = roc_auc_score(y_test, normalized_df_knn_pred)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.box(False)
plt.show()
>
Figure 5. AUC ROC Curve

Continued to Chapter 6