Running Custom LoRA Text-to-Image Models On-Device with iOS CoreML
On-device image generation has taken a massive leap forward. With Apple’s latest CoreML upgrades, iOS devices can now execute complex Stable Diffusion models locally. More importantly, CoreML now natively supports dynamic loading of Low-Rank Adaptation (LoRA) weights at runtime.
This means you can ship a single base text-to-image model in your app, and dynamically apply lightweight LoRA packages (often under 20MB) to change the generation style—such as transforming photos into sketches, pixel art, or watercolors.
In this guide, we will implement a dynamic LoRA loading pipeline in Swift.
The Core Architecture
To run a stylized generation, we need:
- A Base Stable Diffusion CoreML Model: Stored in the app bundle or downloaded once.
- LoRA Weight Packages: Downloaded dynamically from your server as
.mlmodelcassets. - CoreML’s Compilation & Execution Engine: Merging the weights in memory before inference.
+-----------------------+
| Base Model (1GB) |
+-----------+-----------+
|
(Merged at Runtime)
|
+-------------------+ v +-------------------+
| Sketch LoRA (20M) | ----> [Engine] <---- | Pixel LoRA (15M) |
+-------------------+ +-------------------+
Compiling LoRA Weights Dynamically
Before applying a LoRA, the raw weight file must be compiled into a format the Apple Neural Engine (ANE) understands. We can compile LoRA configurations on-device:
import CoreML
class ModelManager {
func compileAndSaveLoRA(at fileURL: URL) async throws -> URL {
// Compile the raw CoreML weights model at runtime
let compiledURL = try await MLModel.compileModel(at: fileURL)
// Save the compiled model package to the app cache
let fileManager = FileManager.default
let cacheDirectory = fileManager.urls(for: .cachesDirectory, in: .userDomainMask).first!
let destinationURL = cacheDirectory.appendingPathComponent(fileURL.lastPathComponent + "c")
if fileManager.fileExists(atPath: destinationURL.path) {
try fileManager.removeItem(at: destinationURL)
}
try fileManager.copyItem(at: compiledURL, to: destinationURL)
return destinationURL
}
}
Running Inference with Compiled LoRA
Once compiled, apply the LoRA to the Stable Diffusion configuration. Apple’s CoreML framework provides specific configuration attributes to bind weight adaptations:
import CoreML
class ImageGenerator {
private var baseModel: MLModel?
func loadBaseModel(modelURL: URL) async throws {
let configuration = MLModelConfiguration()
// Allow execution on Apple Neural Engine and GPU
configuration.computeUnits = .all
baseModel = try await MLModel.load(contentsOf: modelURL, configuration: configuration)
}
func generateImage(prompt: String, loraURL: URL) async throws -> CGImage {
guard let baseModel = baseModel else {
throw NSError(domain: "Generator", code: 404, userInfo: [NSLocalizedDescriptionKey: "Base model not loaded"])
}
let configuration = MLModelConfiguration()
configuration.computeUnits = .all
// Dynamic loading of LoRA weights into the base configuration
let loraWeightSpec = try MLModelDescription.load(contentsOf: loraURL)
configuration.modelAssetDescription = loraWeightSpec.description
// Trigger inference using CoreML Stable Diffusion pipeline
// (Actual pipeline loops through U-Net steps and runs VAE decoding)
let generatedImage = try await runDiffusionPipeline(prompt: prompt, configuration: configuration)
return generatedImage
}
private func runDiffusionPipeline(prompt: String, configuration: MLModelConfiguration) async throws -> CGImage {
// Placeholder for diffusion execution steps
// ...
return UIImage(systemName: "photo")!.cgImage!
}
}
Optimizing for Memory
Stable Diffusion models require substantial RAM. To keep your app from crashing due to memory limits (jetsam events) on standard iOS devices:
- Quantize the Base Model: Use 8-bit or 4-bit weights instead of 16-bit float formats.
- Dispose Configs Promptly: Deallocate model reference sheets immediately after generation finishes.
- Restrict Input Resolutions: Limit output images to 512x512 pixels on devices with under 6GB of RAM.