How to Get Started with Phishing URL Detection Using ONNX

Dec 1, 2023 | Educational

Phishing attacks are a common threat in today’s digital world, targeting unsuspecting users through deceptive links. Fortunately, with advancements in machine learning, we can predict the likelihood of a URL being a phishing site. This blog provides a step-by-step guide on utilizing a phishing URL detection model with ONNX, ensuring secure and efficient implementations.

Model Overview

This model is designed to assess the probability of URLs being phishing sites. It employs a LinearSVM for binary classification, with the following evaluation metrics:

ROC AUC: 0.986844
Accuracy: 0.948568
F1 Score: 0.948623
Precision: 0.947619
Recall: 0.949629

For more details about phishing, check out the Wikipedia page on Phishing.

Why Use ONNX Instead of Pickle?

Using pickle in Python is discouraged due to potential security risks during data deserialization, which can lead to code injection attacks. Additionally, pickle lacks portability across different Python versions and interoperability with other programming languages. Instead, the ONNX format is recommended as it is more secure, lightweight, and faster. It can be utilized across multiple programming languages supported by the ONNX runtime.

Getting Started: Code Examples

Here are some examples showing how to implement the model using different languages. We will start with Python, followed by Node.js, and then use JavaScript for web applications.

Python Implementation

Let’s assume you are a chef in a kitchen, mixing ingredients to create the perfect dish. In this case, the ingredients are URLs, and your ONNX model serves as a recipe that tells you how likely it is for each URL to be a phishing site. Here’s how to set it up:

import numpy as np
import onnxruntime
from huggingface_hub import hf_hub_download

REPO_ID = "pirochetophishing-url-detection"
FILENAME = "model.onnx"
model_path = hf_hub_download(repo_id=REPO_ID, filename=FILENAME)

# Initializing the ONNX Runtime session with the pre-trained model
sess = onnxruntime.InferenceSession(model_path)

urls = [
    "https://clubedemilhagem.com/home.php",
    "http://www.medicalnewstoday.com/articles/188939.php",
]

inputs = np.array(urls, dtype=str)

# Using the ONNX model to make predictions on the input data
results = sess.run(None, {"inputs": inputs})[1]
for url, proba in zip(urls, results):
    print(f"URL: {url}")
    print(f"Likelihood of being a phishing site: {proba[1] * 100:.2f}%")
    print("----")

Node.js Implementation

Continuing with our analogy of being a chef, imagine now you have a delivery service (Node.js) to send your dishes (predictions) directly to customers. Here’s how you would implement it:

const ort = require("onnxruntime-node");

async function main() {
    try {
        const model_path = "./model.onnx";
        const session = await ort.InferenceSession.create(model_path);

        const urls = [
            "https://clubedemilhagem.com/home.php",
            "http://www.medicalnewstoday.com/articles/188939.php",
        ];

        const tensor = new ort.Tensor("string", urls, [urls.length]);
        const results = await session.run({ inputs: tensor });
        const probas = results['probabilities'].data;

        urls.forEach((url, index) => {
            const proba = probas[index * 2 + 1];
            const percent = (proba * 100).toFixed(2);
            console.log(`URL: ${url}`);
            console.log(`Likelihood of being a phishing site: ${percent}%`);
            console.log("----");
        });
    } catch (e) {
        console.log(`Failed to inference ONNX model: ${e}.`);
    }
}

main();

JavaScript Implementation

Lastly, let’s say that you want to present your delicious dishes to your guests in a beautiful dining room (a web interface). Use the following code to create an interactive webpage for the phishing detection model:



  
    Get Started with JavaScript

Troubleshooting Tips

If you encounter issues while running the code, consider the following troubleshooting ideas:

Ensure that the ONNX model file path is correct and accessible.
Verify that all necessary libraries are installed, including onnxruntime and huggingface_hub.
Check for network connectivity if you are downloading the model from the Hugging Face Hub.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By following this guide, you can utilize the phishing URL detection model to bolster your online safety measures. Remember, the ONNX format not only enhances security but also expands possibilities across various programming environments.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox