Risk Stratification Prior To PCI

Apurva Dubey
6 min readDec 6, 2021

The traditional method of making judgments on the risk analysis in people suffering from coronary artery disease undergoing percutaneous coronary intervention (PCI) is done on the basis of limited clinical factors and medical report images. Decision-making is multiplexed in the modern health care system and rather is based on available data, structured understanding, and proper interpretation in the context of an individual patient. Machine learning has a major contribution in characterizing cardiovascular risk, outcome prediction, in biomarkers identification on available data of a huge population. Knowing the fact that a huge number of cardiac patients require PCI, there is no clear-cut outcome based on the application of Machine Learning to this specific group of patients to avail the advantage of Machine Learning in giving judgment on the analysis of precise prognostic endpoints on a large scale.

In recent years, percutaneous coronary intervention (PCI) has been a leading-edge innovation for the treatment of coronary artery disease. Risk stratification for diagnosis is crucial for patients and their personalized management who are undergoing PCI (percutaneous coronary intervention). An appraisal system for the long-term diagnosis of PCI patients may need to merge powerful all-rounder factors. Traditional prognostic risk assessment has limited strength to develop risk stratification. We designed a Machine Learning risk stratification tool that is able to assess and stratify the risk in various cases in patients prior to PCI. Through a comprehensive study, the best-performing machine learning model was random-forest which was used to predict and stratify patients in having different medical scores to generate a distinguished description of the model’s decisions and provide a precise result of individualized risk prediction and provide physicians an interpretation on the basis of key features of the given medical records.

Project Model Design

With the help of all the research papers we referred to, we came to know that decision tree and random forest algorithms outperforms by giving maximum accuracy. A decision tree is the most influential and simplified algorithm for classification and prediction. It is a schema-like tree structure, where each internal step indicates a test on a feature, each branch signifies an outcome of the test, and each leaf node holds a decision.

Decision Tree Workflow

To learn classification, data must know the best feature at each step while building a Decision tree, for that we must find which feature gives the most valuable information using the concept of information gain. Information Gain helps in calculating the decrease in entropy and finds how well an attribute categorizes the target classes. The attribute which has the maximum value of Information Gain is considered as the best feature. To calculate the information gain, we must know the entropy of each feature. Entropy is basically a measure of dissimilarity in the target variable in a dataset. In the case of binary classification,

if entropy is 0, then all values in the target variable are the same(either all positives or all negatives)

if entropy is 1, then the target variable has an equal number of positive and negative values.

Entropy is calculated as:

Entropy(S) = summation of {for i=1 to n {(P[i] * base2log(P[i]))}

S -> entropy

n -> total number of classes in the target column ,in our case n = 2 i.e 1(risky) and 0(not risky)

pᵢ -> probability of class ‘i’ or the ratio of “number of rows with class i in the target column” to the “total number of rows” in the dataset.

Information Gain for a feature column A is calculated as:

IG(S,A) = Entropy(S) — summation of{ ( |S[v]| / |S| ) * Entropy(S[v]) }

Sᵥ -> set of rows in S for which the feature column A has value v

|Sᵥ| -> number of rows in Sᵥ

|S| -> number of rows in S

Another factor for a model to consider while learning classification is the purity while creating a decision tree. This can be measured using the Gini index. A low Gini index feature must be preferred as compared to a high Gini Index value feature. This index is helpful in creating binary splits.

Gini index can be calculated as :

Gini Index = 1 — ( summation of{ (sqr(P[j]) ) } )

We have a huge dataset and have multi-classification of each feature. We get lengthy and not so easy to read the decision tree, as the resultant decision has lots of layers. The multi-classification of each feature made it computationally complex and it may have overfitting issues too.

Random forest is a pliant, easy-to-understand machine learning algorithm that gives good results most of the time. It is one of the most practiced algorithms, because of its flexibility and multifariousness i.e it is used for both classification and regression problems.

It works on ensemble learning. Ensemble learning is a process that integrates multiple classifiers or models to provide solutions to complex problems and to make predictions rather than an individual model.

Random Forest Workflow

A random forest algorithm comprises many decision trees. The ‘forest’ created by the random forest algorithm is trained using bagging or bootstrap aggregating techniques. Bagging is a method that improves the performance of machine learning algorithms. It is also called Bootstrap Aggregation, which selects a random set from the data. Each model created from the samples from the given Data with substitution is called row sampling. Row sampling with substitution is called bootstrap. Further, individual model’s training takes place independently which gives the results. The end result is based on maximum voting after integrating the outcomes of all models. This process of combining all the results and calculating output based on maximum voting is called aggregation.

Random Forest Workflow In Detail

Random forest algorithm establishes the result based on the predictions of the decision trees. It calculates the result by averaging the outputs from multiple decision trees. The precision of the results increases by increasing the number of trees. A random forest overcomes the disadvantages of a decision tree algorithm. It decreases the overfitting of datasets and improves precision. It makes predictions without the need for many configurations in packages.

Implementation And Results

Decision Tree

Decision Tree Model Implementation
Confusion Matrix Of Decision Tree Classifier
Overfitting Of Decision Tree Model
Confusion Matrix Report Of Decision Tree Model

Random Forest

Random Forest Implementation
Confusion Matrix Of Random Forest
Random Forest Model Overcome Overfitting Of Decision Tree Model
Confusion Matrix Report Of Random Tree Model

Conclusion

The Random Forest Algorithm combines the output of multiple (randomly created) Decision Trees to generate the final output.

Random forest leverages the power of multiple decision trees. It does not rely on the feature importance given by a single decision tree. That is why it performed better.

I hope this blog will help someone who is looking for solutions for the same problem. Thank you so much for reading!!

Here I’m attaching the complete source code uploaded at GitHub!

Connect With Me at LinkedIn.

--

--

Apurva Dubey
0 Followers

Full-Stack Developer | GSSoC'21 Participant | Chairperson at WIE-IEEE JUIT SB