When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance and that’s exactly what we want. And it is where the Confusion matrix comes into play. Confusion Matrix is a performance measurement for machine learning classification and helps to improve the accuracy of our model. It can be applied to binary classification as well as for multiclass classification problems.
It is a table with 4 different combinations of predicted and actual values.
Confusion matrices represent counts from predicted and actual values. The output “TN” stands for True Negative which shows the number of negative examples classified accurately. Similarly, “TP” stands for True Positive which indicates the number of positive examples classified accurately. The term “FP” shows False Positive value, i.e., the number of actual negative examples classified as positive; and “FN” means a False Negative value which is the number of actual positive examples classified as negative. One of the most commonly used metrics while performing classification is accuracy. The accuracy of a model (through a confusion matrix) is calculated using the given formula below.
Accuracy can be misleading if used with imbalanced datasets, and therefore there are other metrics based on confusion matrix which can be useful for evaluating performance. In Python, confusion matrix can be obtained using “confusion_matrix()” function which is a part of “sklearn” library. This function can be imported into Python using “from sklearn.metrics import confusion_matrix.” To obtain confusion matrix, we need to provide actual values and predicted values to the function.
As we know FP i.e False Positive and FN i.e False Negative are very dangerous and thus lead to greater problem. Consider one example of Intrusion Detection System(IDS) which is a system that monitors network traffic for suspicious activity and issues alerts when such activity is discovered. Any malicious venture or violation is normally reported either to an administrator or collected centrally using a security information and event management (SIEM) system.A SIEM system integrates outputs from multiple sources and uses alarm filtering techniques to differentiate malicious activity from false alarms.
Although intrusion detection systems monitor networks for potentially malicious activity, they are also disposed to false alarms. Hence, organizations need to fine-tune their IDS products when they first install them. It means properly setting up the intrusion detection systems to recognize what normal traffic on the network looks like as compared to malicious activity.
Intrusion prevention systems also monitor network packets inbound the system to check the malicious activities involved in it and at once sends the warning notifications. Now, suppose if some hacker is trying to get into our system and at that time if IDS generates False Positive which is also called as Type-II error then administrator will never come to know about this malicious activity and system will be hacked.
Due to this, it has lead to many such Cyber Security related problems.
Cyber-attacks have become one of the biggest problems of the world. They cause serious financial damages to countries and people every day. The increase in cyber-attacks also brings along cyber-crime. Cyber-criminals hack user’s personal computers, smartphones, personal details from social media, business secrets, national secrets, important personal data, etc with the help of internet and technology. Hackers are the criminals who are performing these illegal, malicious activities on the internet. Though some agencies are trying to tackle this problem, it is growing regularly and many people have become victims of identity theft, hacking, and malicious software. Let’s find out more about cyber-crimes. Thus, in order to determine the best performance of model confusion matrix is used and by analyzing it’s table of TP, TN, FP and FN values. There are various models that are bulit such as Computational System to Classify Cyber Crime Offenses using Machine Learning , Botnet Detection Based On Machine Learning Techniques Using DNS Query Data.
A botnet is a collection of Internet-connected user computers (bots) infected by malicious software (malware) that allows the computers to be controlled remotely by an operator (bot herder) through a Command-and-Control (C&C) server to perform automated tasks, such as stealing information or launching attacks on other computers. Botnet malware is designed to give its operators control of many user computers at once. This enables botnet operators to use computing and bandwidth resources across many different networks for malicious activities.
Thanks for Reading!!