Fine-Tuning Bert using LoRA, hosting on Cloudflare using Cloudflare AI Workers

Let’s say you have a website and you would like to add content filtering to it.

So yes, you could use OpenAI Content Moderation API, but what if you want your own solution?

I did a quick POC with Cloudflare (“CF” now on) Workers + CF AI, to test if I can create and serve such model.

CF offers AI workers, which are a closed set of models *and fine-tunes* on these models.

So, we can take a BERT model, fine-tune using LoRA, and serve via CF AI API.

I guess the closed set of base models because of the serving optimizations, which allows to serve many in only fraction of the required resources.

Setup

There are many content moderation datasets, but for this example let’s use well-known the Spam/No Spam dataset.

Installing basic libs and setting up.

Logging-in to Clouflare — we have 2 issues: #1 — The console asks for consent regarding telemetry — so we use !yes for that; Issue #2 — as part of the Oauth auth, the browser redirects to localhost, which is not good..

So we log-in using Cloudflare API Token, and not browser Oauth flow.

# install some libs; use node.js 18.x (LTS), and verify
!pip install transformers torch pandas peft datasets numpy scikit-learn

!apt-get remove -y nodejs npm
!curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -

!node --version
!npm --version
!npm install -g wrangler
!which wrangler

CLOUDFLARE_API_TOKEN = "...."
os.environ['CLOUDFLARE_API_TOKEN'] = CLOUDFLARE_API_TOKEN
!wrangler --version
!yes | wrangler whoami

Basic Fine-Tuning

Here we have basic fine-tuning code; which will take bert-base-uncased and fine-tune on the Spam/No-spam dataset.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    TrainingArguments,
    DataCollatorWithPadding
)
from peft import get_peft_model, LoraConfig, TaskType
from datasets import Dataset
import os

class SpamDetectorTrainer:
    def __init__(self, model_name="bert-base-uncased"):
        self.model_name = model_name
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(
            model_name,
            num_labels=2
        )


    def load_data(self, url="https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip"):
        """Load and prepare the SMS spam dataset"""
        os.system(f"wget {url} -O smsspamcollection.zip")
        os.system("unzip smsspamcollection.zip")

        df = pd.read_csv("SMSSpamCollection", sep='\t', header=None, names=['label', 'message'])
        df['label'] = df['label'].map({'ham': 0, 'spam': 1})

        train_df, eval_df = train_test_split(df, test_size=0.2, random_state=42)
        self.train_dataset = Dataset.from_pandas(train_df)
        self.eval_dataset = Dataset.from_pandas(eval_df)

        return self.train_dataset, self.eval_dataset

    def preprocess_data(self):
        """Tokenize and prepare datasets"""
        def tokenize_function(examples):
            return self.tokenizer(
                examples['message'],
                truncation=True,
                padding=True,
                max_length=128
            )

        self.tokenized_train = self.train_dataset.map(tokenize_function, batched=True)
        self.tokenized_eval = self.eval_dataset.map(tokenize_function, batched=True)

        self.tokenized_train.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
        self.tokenized_eval.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])

        return self.tokenized_train, self.tokenized_eval

    def setup_lora(self, r=8, alpha=32, dropout=0.1):
        """Configure and apply LoRA"""
        lora_config = LoraConfig(
            task_type=TaskType.SEQ_CLS,
            inference_mode=False,
            r=r,
            lora_alpha=alpha,
            lora_dropout=dropout,
            target_modules=['query', 'value']
        )
        self.lora_model = get_peft_model(self.model, lora_config)
        return self.lora_model

    def train(self, output_dir="./results", epochs=3, batch_size=16):
        """Train the model"""
        training_args = TrainingArguments(
            output_dir=output_dir,
            evaluation_strategy="steps",
            eval_steps=500,
            save_strategy="steps",
            save_steps=500,
            learning_rate=2e-5,
            per_device_train_batch_size=batch_size,
            per_device_eval_batch_size=batch_size,
            num_train_epochs=epochs,
            weight_decay=0.01,
            logging_dir='./logs',
            logging_steps=100,
            load_best_model_at_end=True,
            metric_for_best_model="loss",
            save_total_limit=3,
        )

        data_collator = DataCollatorWithPadding(tokenizer=self.tokenizer)

        trainer = Trainer(
            model=self.lora_model,
            args=training_args,
            train_dataset=self.tokenized_train,
            eval_dataset=self.tokenized_eval,
            data_collator=data_collator,
        )

        # Train and evaluate
        train_result = trainer.train()
        eval_result = trainer.evaluate()

        return train_result, eval_result

    def save_model(self, path="lora_spam_adapter"):
        """Save the LoRA adapter"""
        self.lora_model.save_pretrained(path)
        self.tokenizer.save_pretrained(path)

trainer = SpamDetectorTrainer()
train_dataset, eval_dataset = trainer.load_data()
trainer.preprocess_data()
trainer.setup_lora()
train_result, eval_result = trainer.train()
trainer.save_model()

Hyper-Parameters Search

There are several parameters worth checking out; I’m not an expert LoRA fine-tuning guy, but read some blog posts about it.

We could do a grid search and let the machine churn a little bit to squeeze some more performance.

eg, Rank , which is the rank of the inner matrix; more compute work, but more information caputred; and also learning-rate.

results = []
learning_rates = [1e-5, 2e-5, 5e-5]
lora_ranks = [4, 8, 16]

for lr in learning_rates:
    for rank in lora_ranks:
        print(f"Training with learning_rate={lr}, lora_rank={rank}")

        # Reinitialize the model and LoRA setup
        trainer.model = AutoModelForSequenceClassification.from_pretrained(
            trainer.model_name,
            num_labels=2
        )
        trainer.setup_lora(r=rank)

        # Train the model
        train_result, eval_result = trainer.train(
            epochs=3,
            batch_size=16,
            learning_rate=lr
        )

        train_loss = train_result.training_loss
        eval_loss = eval_result['eval_loss']

        results.append({
            'learning_rate': lr,
            'lora_rank': rank,
            'train_loss': train_loss,
            'eval_loss': eval_loss
        })
        print(f"Results: train_loss={train_loss:.4f}, eval_loss={eval_loss:.4f}")

Hyper-Parameters Search — Results

Loss, Learning Rate and Rank

Push to Cloudflare AI

Here we actually understand we can not push to Cloudflare, because of wrong selection of base model; eg Cloudflare AI worker with LoRA Fine-tuning have very specific list of based models, and Bert is not one of them: https://developers.cloudflare.com/workers-ai/fine-tunes/loras/

@cf/meta-llama/llama-2–7b-chat-hf-lora @cf/mistral/mistral-7b-instruct-v0.2-lora @cf/google/gemma-2b-it-lora @cf/google/gemma-7b-it-lora

We can do training again, this time using AutoTrain . And serve using this code sample, with adding finetune id.

export interface Env {
  // If you set another name in wrangler.toml as the value for 'binding',
  // replace "AI" with the variable name you defined.
  AI: Ai;
}

export default {
  async fetch(request, env): Promise<Response> {
    const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
      prompt: "What is the origin of the phrase Hello, World",
    });

    return new Response(JSON.stringify(response));
  },
} satisfies ExportedHandler<Env>;

Conclusion

We saw how to fine-tune Bert model using LoRA.

We [almost] saw wow to deploy (specific base models) into Cloudflare AI Workers, after fine-tuning.

We tested several hyper-parameters for training, to get optimized model.