Train Custom AI Model: LoRA Photo Generation Guide

Disclosure: RunAICode.ai may earn a commission when you purchase through links on this page. This doesn’t affect our reviews or rankings. We only recommend tools we’ve tested and believe in. Learn more.

TL;DR — Skip the Article, Just Run This

Don’t want to read 2,000 words? Open Claude Code (or any AI coding assistant) and paste this:

Clone https://github.com/runaicode/lora-training-pipeline and set up the LoRA training pipeline for me. Check my GPU first.

That’s it. Claude will clone the repo, check your hardware, estimate training time, and walk you through the entire process step by step. If your PC can’t handle it, it’ll set you up with a cloud GPU instead.

A complete guide to building an automated LoRA training pipeline — from thousands of raw photos to a working AI model, using hardware you probably already own.

What This Guide Covers

Most AI image generators create generic faces. But what if you want to generate photorealistic images of a specific real person — in any setting, any pose, any lighting? That requires training a custom model on their photos.

The problem is that most people do this badly. They dump 20 random photos into a training script, get blurry results that look nothing like the person, and conclude it doesn’t work.

This guide shows you how to do it right — with an automated pipeline that:

Pulls thousands of source photos from cloud storage
Uses computer vision to find and score every photo of the target person
Detects professional color calibration charts for accurate skin tone reproduction
Selects the optimal 30-50 training images with the right mix of angles and expressions
Produces a training-ready dataset with unique per-image captions
Trains a LoRA model that nails the person’s identity

The entire pipeline runs on a home PC. No cloud subscriptions, no expensive hardware.

The Two Approaches: LoRA vs Full Fine-Tuning

Before we start, you need to understand there are two paths to custom AI photo generation. Which one you choose depends on your hardware and budget.

Path 1: LoRA (Low-Rank Adaptation) — Home PC Friendly

Aspect	Details
What it does	Adds a small “adapter” to an existing AI model that encodes the person’s identity
Training images	30–50 carefully selected photos
Hardware needed	1 GPU with 8GB+ VRAM (RTX 3060, 3070, 4060, etc.)
Training time	2–3 hours
Output size	~200MB adapter file
Cost	$0 if you own a gaming PC
Quality	Excellent for faces and upper body

LoRA works by only modifying a tiny fraction of the base model’s parameters (~2-4 million out of billions). This means it trains fast, needs little data, and produces a small portable file. The catch is that it has a ceiling — throw too many images at it (100+) and the identity gets diluted rather than improved.

Path 2: Full Fine-Tuning — Cloud GPU Required

Aspect	Details
What it does	Modifies ALL parameters of the base model to deeply encode the person
Training images	500–5,000+ photos
Hardware needed	4-8 GPUs with 40-80GB VRAM each (A100, H100)
Training time	1–3 days
Output size	12–24GB full model checkpoint
Cost	$50-150 renting cloud GPUs (RunPod, Vast.ai, Lambda)
Quality	Superior for full body, varied poses, complex scenes

Full fine-tuning rewrites the entire model to know this person. It produces better results for complex scenarios but requires serious compute. You don’t need to own the hardware — rent a multi-GPU server for 24-48 hours, upload your photos, train, download the result, and delete the server.

Cloud GPU Providers:

Provider	GPU Options	Approx. Cost	Notes
RunPod	A100 80GB, H100	$1-6/hr	Most popular for AI training
Vast.ai	Mixed marketplace	$0.50-3/hr	Budget option, variable quality
Lambda Labs	A100, H100	$1-2/hr	Research-focused
Replicate	Managed	Pay per run	Easiest setup, higher per-run cost

This guide focuses on Path 1 (LoRA) since it’s accessible to everyone with a gaming PC. The photo curation pipeline is the same for both paths — only the training step differs.

Phase 1: Photo Acquisition

What You Need

Source photos: The more the better. We started with 3,000+ and selected 40. Having a large pool lets you be extremely picky about quality.
A mix of shots: Face close-ups (70%), three-quarter angles (20%), and full body (10%)
Multiple lighting conditions: Indoor, outdoor, warm light, cool light
Multiple expressions: Neutral, smiling, laughing, serious
A Calibrite ColorChecker Passport Photo (optional but highly recommended): Photograph it once per lighting setup for accurate color calibration

Calibrite ColorChecker Passport Photo 2

The industry standard for color accuracy. Professional photographers use this to ensure skin tones, clothing colors, and backgrounds look exactly right — regardless of lighting conditions. For AI training, it means your model learns the person’s actual appearance, not lighting artifacts.

Check Price on Amazon →

Affiliate link — helps support RunAICode at no extra cost to you.

Source Photo Guidelines

What Helps Training	What Hurts Training
Sharp, well-focused photos	Motion blur, out of focus
Varied angles (front, 3/4, profile)	All same angle
Varied expressions	All same expression
Clean backgrounds	Cluttered, busy backgrounds
Good lighting	Over/underexposed
RAW files (for color correction)	Low-quality compressed JPEGs
Color checker per lighting setup	Inconsistent color across shoots

The Download Pipeline

If your photos are on Google Drive, the pipeline automates the download. You don’t need to do this manually — the repo’s CLAUDE.md tells Claude Code exactly how to run this for you. But here’s what’s happening under the hood:

# The pipeline's download script (scripts/download_gdrive.py) handles everything:
# 1. Takes your Google Drive folder URL
# 2. Extracts the folder ID automatically
# 3. Recursively scans all subfolders for images
# 4. Downloads JPG, PNG, RAW (ARW/CR2/NEF/DNG) — skips videos
# 5. Resumes interrupted downloads automatically

import urllib.request, urllib.parse, json

API_KEY = "your-google-drive-api-key"    # You provide this once
FOLDER_ID = "your-folder-id-from-url"    # Extracted from your Drive link

# List all files in the shared folder
params = {
    "q": f"'{FOLDER_ID}' in parents and trashed=false",
    "fields": "files(id,name,mimeType,size)",
    "pageSize": "1000",
    "key": API_KEY,
}
url = "https://www.googleapis.com/drive/v3/files?" + urllib.parse.urlencode(params)
response = urllib.request.urlopen(url)
files = json.loads(response.read())["files"]

# Filter for images only (supports RAW formats too)
IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".arw", ".cr2", ".nef", ".dng", ".raw"}
images = [f for f in files if any(f["name"].lower().endswith(ext) for ext in IMAGE_EXTS)]

# Download each image to the local downloads/ folder
for img in images:
    download_url = f"https://www.googleapis.com/drive/v3/files/{img['id']}?alt=media&key={API_KEY}"
    urllib.request.urlretrieve(download_url, f"downloads/{img['name']}")

Storage tip: Use a NAS or external drive. RAW photos from a professional camera are ~60MB each. 3,000 of them = ~180GB. Don’t try to process these on your boot drive.

Phase 2: Intelligent Photo Curation

This is where the magic happens. Instead of manually scrolling through thousands of photos, we use computer vision to automate the entire selection process. Again — Claude Code handles all of this for you. The code below explains what’s happening behind the scenes.

Phase 2, Step 1: Install Dependencies

# Claude Code runs this automatically, but here's what gets installed:
# - OpenCV for face detection and image analysis
# - NumPy for numerical operations
# - rawpy for processing RAW camera files (Sony ARW, Canon CR2, etc.)

pip install opencv-python-headless numpy rawpy

# Download the AI models used for face detection and recognition
mkdir -p models && cd models

# DNN face detector — more accurate than older Haar cascade method
curl -LO "https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt"
curl -LO "https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20170830/res10_300x300_ssd_iter_140000.caffemodel"

# SFace recognition model — identifies WHO is in each photo
curl -LO "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx"

Phase 2, Step 2: Build a Reference Identity

You need 10-20 photos where you’re certain the target person is the main subject. These become the “identity anchor” that the pipeline matches against.

import cv2
import numpy as np

# Load the face detection and recognition AI models
face_net = cv2.dnn.readNetFromCaffe(
    "models/deploy.prototxt",
    "models/res10_300x300_ssd_iter_140000.caffemodel"
)
face_recognizer = cv2.FaceRecognizerSF.create(
    "models/face_recognition_sface_2021dec.onnx", ""
)

def detect_faces(img, confidence_threshold=0.6):
    """Find all faces in an image using DNN detector.
    Returns list of (x, y, width, height, confidence) for each face found."""
    h, w = img.shape[:2]
    blob = cv2.dnn.blobFromImage(img, 1.0, (300, 300), (104, 177, 123))
    face_net.setInput(blob)
    detections = face_net.forward()

    faces = []
    for i in range(detections.shape[2]):
        conf = detections[0, 0, i, 2]
        if conf < confidence_threshold:
            continue
        x1 = int(detections[0, 0, i, 3] * w)
        y1 = int(detections[0, 0, i, 4] * h)
        x2 = int(detections[0, 0, i, 5] * w)
        y2 = int(detections[0, 0, i, 6] * h)
        faces.append((x1, y1, x2 - x1, y2 - y1, float(conf)))
    return faces

def get_embedding(img, face_bbox):
    """Extract a 128-dimension face fingerprint.
    This is like a unique ID for the person's face."""
    roi = np.array(face_bbox[:4], dtype=np.int32)
    aligned = face_recognizer.alignCrop(img, roi)
    return face_recognizer.feature(aligned)

# Build reference embeddings from your known-good photos
reference_embeddings = []
for photo_path in known_photos:
    img = cv2.imread(photo_path)
    faces = detect_faces(img)
    if faces:
        # Use the largest face in the photo (most likely the main subject)
        largest = max(faces, key=lambda f: f[2] * f[3])
        emb = get_embedding(img, largest)
        reference_embeddings.append(emb)

Phase 2, Step 3: Color Checker Detection

If you shoot with a Calibrite ColorChecker Passport Photo, the pipeline can detect it and compute a color correction matrix for each lighting setup. The ColorChecker has 24 standard color patches with known sRGB values. By comparing what the camera captured versus the known values, we compute a 3×3 matrix that corrects color across the entire shoot.

# Known sRGB reference values for the 24 classic ColorChecker patches
COLORCHECKER_SRGB = np.array([
    [115, 82, 68],     # Dark Skin
    [194, 150, 130],   # Light Skin
    [98, 122, 157],    # Blue Sky
    [87, 108, 67],     # Foliage
    # ... (all 24 patches defined in the full script)
    [243, 243, 242],   # White
    [52, 52, 52],      # Black
], dtype=np.float64)

def compute_color_correction(measured_patches, reference_patches):
    """Compute a 3x3 Color Correction Matrix using least squares.
    This transforms camera colors → accurate sRGB colors."""
    ccm, _, _, _ = np.linalg.lstsq(measured_patches, reference_patches, rcond=None)
    return ccm

def apply_correction(image_rgb, ccm):
    """Apply the correction to every pixel in an image.
    Result: accurate skin tones regardless of lighting conditions."""
    pixels = image_rgb.reshape(-1, 3).astype(np.float64)
    corrected = np.clip(pixels @ ccm, 0, 255).astype(np.uint8)
    return corrected.reshape(image_rgb.shape)

Why this matters for AI training: Without color correction, the model learns the lighting artifacts (orange skin from tungsten, blue tint from shade) instead of the person’s actual appearance. With correction, skin tones are accurate and consistent across all training images.

Phase 2, Step 4: Score and Rank Every Photo

For each photo that matches the target person, we compute a quality score based on sharpness, lighting, face size, and angle:

def score_image(img, face_bbox, match_score):
    """Score a photo on 4 quality factors. Higher = better for training."""
    h, w = img.shape[:2]
    x, y, fw, fh = face_bbox[:4]

    # 1. SHARPNESS — Laplacian variance (higher = sharper image)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    sharpness = cv2.Laplacian(gray, cv2.CV_64F).var()

    # 2. LIGHTING — histogram spread (wider = better exposed)
    hist = cv2.calcHist([gray], [0], None, [256], [0, 256]).flatten()
    hist = hist / hist.sum()
    mean = np.average(range(256), weights=hist)
    std = np.sqrt(np.average((np.arange(256) - mean)**2, weights=hist))
    lighting = min(1.0, std / 80.0)

    # 3. FACE SIZE — larger face in frame = more detail for training
    face_ratio = (fw * fh) / (w * h)

    # 4. FACE ANGLE — estimated from the face bounding box aspect ratio
    aspect = fw / max(fh, 1)
    angle = "front" if aspect > 0.85 else "three_quarter" if aspect > 0.65 else "profile"

    # Weighted composite score
    composite = (
        min(sharpness / 500, 1.0) * 0.3 +   # 30% sharpness
        max(0, lighting) * 0.2 +              # 20% lighting quality
        min(face_ratio * 20, 1.0) * 0.3 +    # 30% face size
        (1.0 - match_score) * 0.2             # 20% identity match
    )

    return composite, angle

Phase 2, Step 5: Select with Diversity

Don’t just take the top 40 by score — you need angle diversity. A model trained entirely on front-facing photos can only generate front-facing images.

# Target distribution for a balanced training set
TARGET = {"front": 15, "three_quarter": 10, "profile": 5}  # = 30 face shots
BODY_SHOTS = 8  # Full body shots where face is small in frame
# Total: 38 training images

selected = []
angle_counts = {"front": 0, "three_quarter": 0, "profile": 0}

# Pick face shots — highest scoring first, but respect the angle quotas
for candidate in sorted(all_candidates, key=lambda x: x["score"], reverse=True):
    angle = candidate["angle"]
    if angle_counts[angle] < TARGET[angle]:
        selected.append(candidate)
        angle_counts[angle] += 1
    if sum(angle_counts.values()) >= 30:
        break

# Add body shots (face is small relative to image — teaches body proportions)
body_candidates = [c for c in all_candidates if c["face_ratio"] < 0.05]
selected.extend(sorted(body_candidates, key=lambda x: x["score"], reverse=True)[:BODY_SHOTS])

Phase 2, Step 6: Crop, Convert, and Caption

Each selected image gets cropped, resized, color corrected, and captioned:

Cropped to head+shoulders (face shots) or full-body framing (body shots)
Resized to 1024×1024 pixels (FLUX native resolution)
Color corrected using the shoot’s ColorChecker data (if available)
Saved as lossless PNG
Captioned with a unique text file describing the image

training-set/
├── subject_001_cc.png
├── subject_001_cc.txt    → "subject_name, front-facing portrait, well-lit"
├── subject_002.png
├── subject_002.txt       → "subject_name, three-quarter view, natural lighting"
├── subject_038.png
├── subject_038.txt       → "subject_name, full body, casual outfit, outdoor"
└── ...

The captions are critical. This is the #1 reason most LoRA training fails. If every image has the same caption (or no caption), the model can’t learn what makes each photo unique. Unique per-image captions tell the model: “this trigger word + front-facing = this look, this trigger word + profile = this look.”

Phase 3: Training

LoRA Training on a Home GPU

Hardware requirements:

NVIDIA GPU with 8GB+ VRAM (RTX 3060/3070/4060 or better)
CUDA 11.8+ installed
16GB+ system RAM
~10GB free disk space

Software setup (Claude Code installs this for you):

# Create an isolated Python environment
python -m venv lora-env
source lora-env/bin/activate

# Install PyTorch with CUDA support (for NVIDIA GPUs)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install the AI training libraries
pip install diffusers transformers accelerate peft
pip install bitsandbytes  # Cuts memory usage in half

Key training parameters for faces:

Parameter	Recommended Value	Notes
Base model	FLUX.1-dev	Best current open model for photorealism
LoRA rank	32–48	Higher = more capacity for facial detail. 16 is too low for faces.
Learning rate	4e-5	Lower is safer. 1e-4 causes model collapse.
Training steps	800–1500	With 40 images: ~20-35 epochs
Batch size	1	Limited by 8GB VRAM
Resolution	1024×1024	FLUX native
Train text encoder	Yes	Critical for caption-triggered generation
Optimizer	AdamW 8-bit	Memory efficient
Checkpoints	Every 200 steps	So you can pick the best stopping point
Trigger word	Unique name (e.g., “subject_face”)	Activates the identity in prompts

Training time: ~2-3 hours on an RTX 3070 Ti (8GB)

Output: A single .safetensors file (~150-300MB) that plugs into any FLUX-compatible image generator.

Full Fine-Tuning on Cloud GPUs

If you want to go bigger with 500+ images:

# On your rented cloud server (RunPod, Vast.ai, etc.):

# Upload your photo dataset
rsync -avz training-set/ user@cloud-server:/workspace/training-set/

# Training takes 12-48 hours depending on dataset size and GPU count
# Output: 12-24GB model checkpoint

# Download the finished model
rsync -avz user@cloud-server:/workspace/output/ ./model-output/

Typical cloud training session:

Rent a 4x A100 server ($6/hr)
Upload 100GB of photos (~1 hour)
Train for 24 hours (~$144)
Download 20GB model (~30 min)
Delete server
Total: ~$150-200 for a one-time training

The Training Image Cheat Sheet

Parameter	Recommendation	Why
Total images	30–50 for LoRA, 500+ for full fine-tune	LoRA sweet spot; more = dilution
Face close-ups	25–35 (LoRA)	Primary identity anchor
Full body	5–8 (LoRA)	Body proportions and style
Front-facing	40% of face shots	Most important angle
Three-quarter	30% of face shots	Adds dimensional understanding
Profile	15% of face shots	Jawline, nose, ears
Expressions	Mix across set	Prevents expression lock-in
Lighting	Mix across set	Prevents lighting bias
Resolution	1024×1024 (FLUX)	Model’s native resolution
Format	PNG (lossless)	No compression artifacts
Captions	Unique per image	#1 quality factor
Color checker	1 per lighting setup	Accurate skin tones
Source pool	As large as possible	Bigger pool = pickier selection

What Kills Training Quality

Identical captions — Model can’t distinguish photos, collapses to average
All same angle — Model only generates that one pose
Blurry photos — Garbage in, garbage out
Too many images for LoRA — 100+ causes identity averaging
Low LoRA rank — Use 32-48 for faces, never 16
Skipping text encoder — Captions won’t trigger properly
Learning rate too high — Model collapses, generates noise

What You Get at the End

A single model file (200MB for LoRA, 12-24GB for full fine-tune) that plugs into any compatible image generation workflow. Open a prompt box and type:

“subject_face, professional headshot, studio lighting, white background”
“subject_face, casual outdoor portrait, golden hour, city street”
“subject_face, full body, elegant outfit, event photography”

Each generation takes 10-30 seconds on a consumer GPU. Unlimited images. Runs entirely locally — no cloud, no subscriptions, no per-image fees.

The total cost: $0 if you own a gaming PC (LoRA path), or $150-200 for a one-time cloud GPU rental (full fine-tune path). Either way, it’s a fraction of what a single professional photo shoot costs, and you can generate images forever.

Tools used: Python, OpenCV, SFace Recognition, Google Drive API, rawpy, diffusers
Compatible base models: FLUX.1-dev, Stable Diffusion XL, SD 1.5
Recommended hardware: Any NVIDIA GPU with 8GB+ VRAM

Train a Custom AI Model: LoRA Photo Generation Guide

TL;DR — Skip the Article, Just Run This

What This Guide Covers

The Two Approaches: LoRA vs Full Fine-Tuning

Path 1: LoRA (Low-Rank Adaptation) — Home PC Friendly

Path 2: Full Fine-Tuning — Cloud GPU Required

Phase 1: Photo Acquisition

What You Need

Source Photo Guidelines

The Download Pipeline

Phase 2: Intelligent Photo Curation

Phase 2, Step 1: Install Dependencies

Phase 2, Step 2: Build a Reference Identity

Phase 2, Step 3: Color Checker Detection

Phase 2, Step 4: Score and Rank Every Photo

Phase 2, Step 5: Select with Diversity

Phase 2, Step 6: Crop, Convert, and Caption

Phase 3: Training

LoRA Training on a Home GPU

Full Fine-Tuning on Cloud GPUs

The Training Image Cheat Sheet

What Kills Training Quality

What You Get at the End

Related Articles

What Is Vibe Coding? The Trend Changing How Developers Build Software

How to Use AI for Code Refactoring: A Practical Guide

OpenClaw Security: 512 Vulnerabilities and How to Stay Safe

Train a Custom AI Model: LoRA Photo Generation Guide

TL;DR — Skip the Article, Just Run This

What This Guide Covers

The Two Approaches: LoRA vs Full Fine-Tuning

Path 1: LoRA (Low-Rank Adaptation) — Home PC Friendly

Path 2: Full Fine-Tuning — Cloud GPU Required

Phase 1: Photo Acquisition

What You Need

Source Photo Guidelines

The Download Pipeline

Phase 2: Intelligent Photo Curation

Phase 2, Step 1: Install Dependencies

Phase 2, Step 2: Build a Reference Identity

Phase 2, Step 3: Color Checker Detection

Phase 2, Step 4: Score and Rank Every Photo

Phase 2, Step 5: Select with Diversity

Phase 2, Step 6: Crop, Convert, and Caption

Phase 3: Training

LoRA Training on a Home GPU

Full Fine-Tuning on Cloud GPUs

The Training Image Cheat Sheet

What Kills Training Quality

What You Get at the End

Related Articles

What Is Vibe Coding? The Trend Changing How Developers Build Software

How to Use AI for Code Refactoring: A Practical Guide

OpenClaw Security: 512 Vulnerabilities and How to Stay Safe

Enjoyed this review?