Fine-Tuning Bert using LoRA, hosting on Cloudflare using Cloudflare AI Workers
Let’s say you have a website and you would like to add content filtering to it.
So yes, you could use OpenAI Content Moderation API, but what if you want your own solution?
I did a quick POC with Cloudflare (“CF” now on) Workers + CF AI, to test if I can create and serve such model.
CF offers AI workers, which are a closed set of models *and fine-tunes* on these models.
So, we can take a BERT model, fine-tune using LoRA, and serve via CF AI API.
I guess the closed set of base models because of the serving optimizations, which allows to serve many in only fraction of the required resources.
Setup
There are many content moderation datasets, but for this example let’s use well-known the Spam/No Spam dataset.
Installing basic libs and setting up.
Logging-in to Clouflare — we have 2 issues:
#1 — The console asks for consent regarding telemetry — so we use !yes for that;
Issue #2 — as part of the Oauth auth, the browser redirects to localhost, which is not good..
So we log-in using Cloudflare API Token, and not browser Oauth flow.
# install some libs; use node.js 18.x (LTS), and verify
!pip install transformers torch pandas peft datasets numpy scikit-learn
!apt-get remove -y nodejs npm
!curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
!node --version
!npm --version
!npm install -g wrangler
!which wrangler
CLOUDFLARE_API_TOKEN = "...."
os.environ['CLOUDFLARE_API_TOKEN'] = CLOUDFLARE_API_TOKEN
!wrangler --version
!yes | wrangler whoami
Basic Fine-Tuning
Here we have basic fine-tuning code; which will take bert-base-uncased and fine-tune on the Spam/No-spam dataset.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import torch
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
Trainer,
TrainingArguments,
DataCollatorWithPadding
)
from peft import get_peft_model, LoraConfig, TaskType
from datasets import Dataset
import os
class SpamDetectorTrainer:
def __init__(self, model_name="bert-base-uncased"):
self.model_name = model_name
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForSequenceClassification.from_pretrained(
model_name,
num_labels=2
)
def load_data(self, url="https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip"):
"""Load and prepare the SMS spam dataset"""
os.system(f"wget {url} -O smsspamcollection.zip")
os.system("unzip smsspamcollection.zip")
df = pd.read_csv("SMSSpamCollection", sep='\t', header=None, names=['label', 'message'])
df['label'] = df['label'].map({'ham': 0, 'spam': 1})
train_df, eval_df = train_test_split(df, test_size=0.2, random_state=42)
self.train_dataset = Dataset.from_pandas(train_df)
self.eval_dataset = Dataset.from_pandas(eval_df)
return self.train_dataset, self.eval_dataset
def preprocess_data(self):
"""Tokenize and prepare datasets"""
def tokenize_function(examples):
return self.tokenizer(
examples['message'],
truncation=True,
padding=True,
max_length=128
)
self.tokenized_train = self.train_dataset.map(tokenize_function, batched=True)
self.tokenized_eval = self.eval_dataset.map(tokenize_function, batched=True)
self.tokenized_train.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
self.tokenized_eval.set_format('torch', columns=['input_ids', 'attention_mask', 'label'])
return self.tokenized_train, self.tokenized_eval
def setup_lora(self, r=8, alpha=32, dropout=0.1):
"""Configure and apply LoRA"""
lora_config = LoraConfig(
task_type=TaskType.SEQ_CLS,
inference_mode=False,
r=r,
lora_alpha=alpha,
lora_dropout=dropout,
target_modules=['query', 'value']
)
self.lora_model = get_peft_model(self.model, lora_config)
return self.lora_model
def train(self, output_dir="./results", epochs=3, batch_size=16):
"""Train the model"""
training_args = TrainingArguments(
output_dir=output_dir,
evaluation_strategy="steps",
eval_steps=500,
save_strategy="steps",
save_steps=500,
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
num_train_epochs=epochs,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=100,
load_best_model_at_end=True,
metric_for_best_model="loss",
save_total_limit=3,
)
data_collator = DataCollatorWithPadding(tokenizer=self.tokenizer)
trainer = Trainer(
model=self.lora_model,
args=training_args,
train_dataset=self.tokenized_train,
eval_dataset=self.tokenized_eval,
data_collator=data_collator,
)
# Train and evaluate
train_result = trainer.train()
eval_result = trainer.evaluate()
return train_result, eval_result
def save_model(self, path="lora_spam_adapter"):
"""Save the LoRA adapter"""
self.lora_model.save_pretrained(path)
self.tokenizer.save_pretrained(path)
trainer = SpamDetectorTrainer()
train_dataset, eval_dataset = trainer.load_data()
trainer.preprocess_data()
trainer.setup_lora()
train_result, eval_result = trainer.train()
trainer.save_model()
Hyper-Parameters Search
There are several parameters worth checking out; I’m not an expert LoRA fine-tuning guy, but read some blog posts about it.
We could do a grid search and let the machine churn a little bit to squeeze some more performance.
eg, Rank , which is the rank of the inner matrix; more compute work, but more information caputred; and also learning-rate.
results = []
learning_rates = [1e-5, 2e-5, 5e-5]
lora_ranks = [4, 8, 16]
for lr in learning_rates:
for rank in lora_ranks:
print(f"Training with learning_rate={lr}, lora_rank={rank}")
# Reinitialize the model and LoRA setup
trainer.model = AutoModelForSequenceClassification.from_pretrained(
trainer.model_name,
num_labels=2
)
trainer.setup_lora(r=rank)
# Train the model
train_result, eval_result = trainer.train(
epochs=3,
batch_size=16,
learning_rate=lr
)
train_loss = train_result.training_loss
eval_loss = eval_result['eval_loss']
results.append({
'learning_rate': lr,
'lora_rank': rank,
'train_loss': train_loss,
'eval_loss': eval_loss
})
print(f"Results: train_loss={train_loss:.4f}, eval_loss={eval_loss:.4f}")
Hyper-Parameters Search — Results

Push to Cloudflare AI
Here we actually understand we can not push to Cloudflare, because of wrong selection of base model; eg Cloudflare AI worker with LoRA Fine-tuning have very specific list of based models, and Bert is not one of them: https://developers.cloudflare.com/workers-ai/fine-tunes/loras/
@cf/meta-llama/llama-2–7b-chat-hf-lora @cf/mistral/mistral-7b-instruct-v0.2-lora @cf/google/gemma-2b-it-lora @cf/google/gemma-7b-it-lora
We can do training again, this time using AutoTrain . And serve using this code sample, with adding finetune id.
export interface Env {
// If you set another name in wrangler.toml as the value for 'binding',
// replace "AI" with the variable name you defined.
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {
prompt: "What is the origin of the phrase Hello, World",
});
return new Response(JSON.stringify(response));
},
} satisfies ExportedHandler<Env>;
Conclusion
We saw how to fine-tune Bert model using LoRA.
We [almost] saw wow to deploy (specific base models) into Cloudflare AI Workers, after fine-tuning.
We tested several hyper-parameters for training, to get optimized model.