Disclosure: RunAICode.ai may earn a commission when you purchase through links on this page. This doesn’t affect our reviews or rankings. We only recommend tools we’ve tested and believe in. Learn more.

TL;DR — Skip the Article, Just Run This

Don’t want to read 2,000 words? Open Claude Code (or any AI coding assistant) and paste this:

Clone https://github.com/runaicode/lora-training-pipeline and set up the LoRA training pipeline for me. Check my GPU first.

That’s it. Claude will clone the repo, check your hardware, estimate training time, and walk you through the entire process step by step. If your PC can’t handle it, it’ll set you up with a cloud GPU instead.

A complete guide to building an automated LoRA training pipeline — from thousands of raw photos to a working AI model, using hardware you probably already own.

What This Guide Covers

Most AI image generators create generic faces. But what if you want to generate photorealistic images of a specific real person — in any setting, any pose, any lighting? That requires training a custom model on their photos.

The problem is that most people do this badly. They dump 20 random photos into a training script, get blurry results that look nothing like the person, and conclude it doesn’t work.

This guide shows you how to do it right — with an automated pipeline that:

The entire pipeline runs on a home PC. No cloud subscriptions, no expensive hardware.


The Two Approaches: LoRA vs Full Fine-Tuning

Before we start, you need to understand there are two paths to custom AI photo generation. Which one you choose depends on your hardware and budget.

Path 1: LoRA (Low-Rank Adaptation) — Home PC Friendly

Aspect Details
What it does Adds a small “adapter” to an existing AI model that encodes the person’s identity
Training images 30–50 carefully selected photos
Hardware needed 1 GPU with 8GB+ VRAM (RTX 3060, 3070, 4060, etc.)
Training time 2–3 hours
Output size ~200MB adapter file
Cost $0 if you own a gaming PC
Quality Excellent for faces and upper body

LoRA works by only modifying a tiny fraction of the base model’s parameters (~2-4 million out of billions). This means it trains fast, needs little data, and produces a small portable file. The catch is that it has a ceiling — throw too many images at it (100+) and the identity gets diluted rather than improved.

Path 2: Full Fine-Tuning — Cloud GPU Required

Aspect Details
What it does Modifies ALL parameters of the base model to deeply encode the person
Training images 500–5,000+ photos
Hardware needed 4-8 GPUs with 40-80GB VRAM each (A100, H100)
Training time 1–3 days
Output size 12–24GB full model checkpoint
Cost $50-150 renting cloud GPUs (RunPod, Vast.ai, Lambda)
Quality Superior for full body, varied poses, complex scenes

Full fine-tuning rewrites the entire model to know this person. It produces better results for complex scenarios but requires serious compute. You don’t need to own the hardware — rent a multi-GPU server for 24-48 hours, upload your photos, train, download the result, and delete the server.

Cloud GPU Providers:

Provider GPU Options Approx. Cost Notes
RunPod A100 80GB, H100 $1-6/hr Most popular for AI training
Vast.ai Mixed marketplace $0.50-3/hr Budget option, variable quality
Lambda Labs A100, H100 $1-2/hr Research-focused
Replicate Managed Pay per run Easiest setup, higher per-run cost

This guide focuses on Path 1 (LoRA) since it’s accessible to everyone with a gaming PC. The photo curation pipeline is the same for both paths — only the training step differs.


Phase 1: Photo Acquisition

What You Need


Calibrite ColorChecker Passport Photo 2

Calibrite ColorChecker Passport Photo 2

The industry standard for color accuracy. Professional photographers use this to ensure skin tones, clothing colors, and backgrounds look exactly right — regardless of lighting conditions. For AI training, it means your model learns the person’s actual appearance, not lighting artifacts.

Check Price on Amazon →

Affiliate link — helps support RunAICode at no extra cost to you.

Source Photo Guidelines

What Helps Training What Hurts Training
Sharp, well-focused photos Motion blur, out of focus
Varied angles (front, 3/4, profile) All same angle
Varied expressions All same expression
Clean backgrounds Cluttered, busy backgrounds
Good lighting Over/underexposed
RAW files (for color correction) Low-quality compressed JPEGs
Color checker per lighting setup Inconsistent color across shoots

The Download Pipeline

If your photos are on Google Drive, the pipeline automates the download. You don’t need to do this manually — the repo’s CLAUDE.md tells Claude Code exactly how to run this for you. But here’s what’s happening under the hood:

# The pipeline's download script (scripts/download_gdrive.py) handles everything:
# 1. Takes your Google Drive folder URL
# 2. Extracts the folder ID automatically
# 3. Recursively scans all subfolders for images
# 4. Downloads JPG, PNG, RAW (ARW/CR2/NEF/DNG) — skips videos
# 5. Resumes interrupted downloads automatically

import urllib.request, urllib.parse, json

API_KEY = "your-google-drive-api-key"    # You provide this once
FOLDER_ID = "your-folder-id-from-url"    # Extracted from your Drive link

# List all files in the shared folder
params = {
    "q": f"'{FOLDER_ID}' in parents and trashed=false",
    "fields": "files(id,name,mimeType,size)",
    "pageSize": "1000",
    "key": API_KEY,
}
url = "https://www.googleapis.com/drive/v3/files?" + urllib.parse.urlencode(params)
response = urllib.request.urlopen(url)
files = json.loads(response.read())["files"]

# Filter for images only (supports RAW formats too)
IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".arw", ".cr2", ".nef", ".dng", ".raw"}
images = [f for f in files if any(f["name"].lower().endswith(ext) for ext in IMAGE_EXTS)]

# Download each image to the local downloads/ folder
for img in images:
    download_url = f"https://www.googleapis.com/drive/v3/files/{img['id']}?alt=media&key={API_KEY}"
    urllib.request.urlretrieve(download_url, f"downloads/{img['name']}")

Storage tip: Use a NAS or external drive. RAW photos from a professional camera are ~60MB each. 3,000 of them = ~180GB. Don’t try to process these on your boot drive.


Phase 2: Intelligent Photo Curation

This is where the magic happens. Instead of manually scrolling through thousands of photos, we use computer vision to automate the entire selection process. Again — Claude Code handles all of this for you. The code below explains what’s happening behind the scenes.

Phase 2, Step 1: Install Dependencies

# Claude Code runs this automatically, but here's what gets installed:
# - OpenCV for face detection and image analysis
# - NumPy for numerical operations
# - rawpy for processing RAW camera files (Sony ARW, Canon CR2, etc.)

pip install opencv-python-headless numpy rawpy

# Download the AI models used for face detection and recognition
mkdir -p models && cd models

# DNN face detector — more accurate than older Haar cascade method
curl -LO "https://raw.githubusercontent.com/opencv/opencv/master/samples/dnn/face_detector/deploy.prototxt"
curl -LO "https://raw.githubusercontent.com/opencv/opencv_3rdparty/dnn_samples_face_detector_20170830/res10_300x300_ssd_iter_140000.caffemodel"

# SFace recognition model — identifies WHO is in each photo
curl -LO "https://github.com/opencv/opencv_zoo/raw/main/models/face_recognition_sface/face_recognition_sface_2021dec.onnx"

Phase 2, Step 2: Build a Reference Identity

You need 10-20 photos where you’re certain the target person is the main subject. These become the “identity anchor” that the pipeline matches against.

import cv2
import numpy as np

# Load the face detection and recognition AI models
face_net = cv2.dnn.readNetFromCaffe(
    "models/deploy.prototxt",
    "models/res10_300x300_ssd_iter_140000.caffemodel"
)
face_recognizer = cv2.FaceRecognizerSF.create(
    "models/face_recognition_sface_2021dec.onnx", ""
)

def detect_faces(img, confidence_threshold=0.6):
    """Find all faces in an image using DNN detector.
    Returns list of (x, y, width, height, confidence) for each face found."""
    h, w = img.shape[:2]
    blob = cv2.dnn.blobFromImage(img, 1.0, (300, 300), (104, 177, 123))
    face_net.setInput(blob)
    detections = face_net.forward()

    faces = []
    for i in range(detections.shape[2]):
        conf = detections[0, 0, i, 2]
        if conf < confidence_threshold:
            continue
        x1 = int(detections[0, 0, i, 3] * w)
        y1 = int(detections[0, 0, i, 4] * h)
        x2 = int(detections[0, 0, i, 5] * w)
        y2 = int(detections[0, 0, i, 6] * h)
        faces.append((x1, y1, x2 - x1, y2 - y1, float(conf)))
    return faces

def get_embedding(img, face_bbox):
    """Extract a 128-dimension face fingerprint.
    This is like a unique ID for the person's face."""
    roi = np.array(face_bbox[:4], dtype=np.int32)
    aligned = face_recognizer.alignCrop(img, roi)
    return face_recognizer.feature(aligned)

# Build reference embeddings from your known-good photos
reference_embeddings = []
for photo_path in known_photos:
    img = cv2.imread(photo_path)
    faces = detect_faces(img)
    if faces:
        # Use the largest face in the photo (most likely the main subject)
        largest = max(faces, key=lambda f: f[2] * f[3])
        emb = get_embedding(img, largest)
        reference_embeddings.append(emb)

Phase 2, Step 3: Color Checker Detection

If you shoot with a Calibrite ColorChecker Passport Photo, the pipeline can detect it and compute a color correction matrix for each lighting setup. The ColorChecker has 24 standard color patches with known sRGB values. By comparing what the camera captured versus the known values, we compute a 3×3 matrix that corrects color across the entire shoot.

# Known sRGB reference values for the 24 classic ColorChecker patches
COLORCHECKER_SRGB = np.array([
    [115, 82, 68],     # Dark Skin
    [194, 150, 130],   # Light Skin
    [98, 122, 157],    # Blue Sky
    [87, 108, 67],     # Foliage
    # ... (all 24 patches defined in the full script)
    [243, 243, 242],   # White
    [52, 52, 52],      # Black
], dtype=np.float64)

def compute_color_correction(measured_patches, reference_patches):
    """Compute a 3x3 Color Correction Matrix using least squares.
    This transforms camera colors → accurate sRGB colors."""
    ccm, _, _, _ = np.linalg.lstsq(measured_patches, reference_patches, rcond=None)
    return ccm

def apply_correction(image_rgb, ccm):
    """Apply the correction to every pixel in an image.
    Result: accurate skin tones regardless of lighting conditions."""
    pixels = image_rgb.reshape(-1, 3).astype(np.float64)
    corrected = np.clip(pixels @ ccm, 0, 255).astype(np.uint8)
    return corrected.reshape(image_rgb.shape)

Why this matters for AI training: Without color correction, the model learns the lighting artifacts (orange skin from tungsten, blue tint from shade) instead of the person’s actual appearance. With correction, skin tones are accurate and consistent across all training images.

Phase 2, Step 4: Score and Rank Every Photo

For each photo that matches the target person, we compute a quality score based on sharpness, lighting, face size, and angle:

def score_image(img, face_bbox, match_score):
    """Score a photo on 4 quality factors. Higher = better for training."""
    h, w = img.shape[:2]
    x, y, fw, fh = face_bbox[:4]

    # 1. SHARPNESS — Laplacian variance (higher = sharper image)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    sharpness = cv2.Laplacian(gray, cv2.CV_64F).var()

    # 2. LIGHTING — histogram spread (wider = better exposed)
    hist = cv2.calcHist([gray], [0], None, [256], [0, 256]).flatten()
    hist = hist / hist.sum()
    mean = np.average(range(256), weights=hist)
    std = np.sqrt(np.average((np.arange(256) - mean)**2, weights=hist))
    lighting = min(1.0, std / 80.0)

    # 3. FACE SIZE — larger face in frame = more detail for training
    face_ratio = (fw * fh) / (w * h)

    # 4. FACE ANGLE — estimated from the face bounding box aspect ratio
    aspect = fw / max(fh, 1)
    angle = "front" if aspect > 0.85 else "three_quarter" if aspect > 0.65 else "profile"

    # Weighted composite score
    composite = (
        min(sharpness / 500, 1.0) * 0.3 +   # 30% sharpness
        max(0, lighting) * 0.2 +              # 20% lighting quality
        min(face_ratio * 20, 1.0) * 0.3 +    # 30% face size
        (1.0 - match_score) * 0.2             # 20% identity match
    )

    return composite, angle

Phase 2, Step 5: Select with Diversity

Don’t just take the top 40 by score — you need angle diversity. A model trained entirely on front-facing photos can only generate front-facing images.

# Target distribution for a balanced training set
TARGET = {"front": 15, "three_quarter": 10, "profile": 5}  # = 30 face shots
BODY_SHOTS = 8  # Full body shots where face is small in frame
# Total: 38 training images

selected = []
angle_counts = {"front": 0, "three_quarter": 0, "profile": 0}

# Pick face shots — highest scoring first, but respect the angle quotas
for candidate in sorted(all_candidates, key=lambda x: x["score"], reverse=True):
    angle = candidate["angle"]
    if angle_counts[angle] < TARGET[angle]:
        selected.append(candidate)
        angle_counts[angle] += 1
    if sum(angle_counts.values()) >= 30:
        break

# Add body shots (face is small relative to image — teaches body proportions)
body_candidates = [c for c in all_candidates if c["face_ratio"] < 0.05]
selected.extend(sorted(body_candidates, key=lambda x: x["score"], reverse=True)[:BODY_SHOTS])

Phase 2, Step 6: Crop, Convert, and Caption

Each selected image gets cropped, resized, color corrected, and captioned:

  1. Cropped to head+shoulders (face shots) or full-body framing (body shots)
  2. Resized to 1024×1024 pixels (FLUX native resolution)
  3. Color corrected using the shoot’s ColorChecker data (if available)
  4. Saved as lossless PNG
  5. Captioned with a unique text file describing the image
training-set/
├── subject_001_cc.png
├── subject_001_cc.txt    → "subject_name, front-facing portrait, well-lit"
├── subject_002.png
├── subject_002.txt       → "subject_name, three-quarter view, natural lighting"
├── subject_038.png
├── subject_038.txt       → "subject_name, full body, casual outfit, outdoor"
└── ...

The captions are critical. This is the #1 reason most LoRA training fails. If every image has the same caption (or no caption), the model can’t learn what makes each photo unique. Unique per-image captions tell the model: “this trigger word + front-facing = this look, this trigger word + profile = this look.”


Phase 3: Training

LoRA Training on a Home GPU

Hardware requirements:

Software setup (Claude Code installs this for you):

# Create an isolated Python environment
python -m venv lora-env
source lora-env/bin/activate

# Install PyTorch with CUDA support (for NVIDIA GPUs)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Install the AI training libraries
pip install diffusers transformers accelerate peft
pip install bitsandbytes  # Cuts memory usage in half

Key training parameters for faces:

Parameter Recommended Value Notes
Base model FLUX.1-dev Best current open model for photorealism
LoRA rank 32–48 Higher = more capacity for facial detail. 16 is too low for faces.
Learning rate 4e-5 Lower is safer. 1e-4 causes model collapse.
Training steps 800–1500 With 40 images: ~20-35 epochs
Batch size 1 Limited by 8GB VRAM
Resolution 1024×1024 FLUX native
Train text encoder Yes Critical for caption-triggered generation
Optimizer AdamW 8-bit Memory efficient
Checkpoints Every 200 steps So you can pick the best stopping point
Trigger word Unique name (e.g., “subject_face”) Activates the identity in prompts

Training time: ~2-3 hours on an RTX 3070 Ti (8GB)

Output: A single .safetensors file (~150-300MB) that plugs into any FLUX-compatible image generator.

Full Fine-Tuning on Cloud GPUs

If you want to go bigger with 500+ images:

# On your rented cloud server (RunPod, Vast.ai, etc.):

# Upload your photo dataset
rsync -avz training-set/ user@cloud-server:/workspace/training-set/

# Training takes 12-48 hours depending on dataset size and GPU count
# Output: 12-24GB model checkpoint

# Download the finished model
rsync -avz user@cloud-server:/workspace/output/ ./model-output/

Typical cloud training session:

  1. Rent a 4x A100 server ($6/hr)
  2. Upload 100GB of photos (~1 hour)
  3. Train for 24 hours (~$144)
  4. Download 20GB model (~30 min)
  5. Delete server
  6. Total: ~$150-200 for a one-time training

The Training Image Cheat Sheet

Parameter Recommendation Why
Total images 30–50 for LoRA, 500+ for full fine-tune LoRA sweet spot; more = dilution
Face close-ups 25–35 (LoRA) Primary identity anchor
Full body 5–8 (LoRA) Body proportions and style
Front-facing 40% of face shots Most important angle
Three-quarter 30% of face shots Adds dimensional understanding
Profile 15% of face shots Jawline, nose, ears
Expressions Mix across set Prevents expression lock-in
Lighting Mix across set Prevents lighting bias
Resolution 1024×1024 (FLUX) Model’s native resolution
Format PNG (lossless) No compression artifacts
Captions Unique per image #1 quality factor
Color checker 1 per lighting setup Accurate skin tones
Source pool As large as possible Bigger pool = pickier selection

What Kills Training Quality

  1. Identical captions — Model can’t distinguish photos, collapses to average
  2. All same angle — Model only generates that one pose
  3. Blurry photos — Garbage in, garbage out
  4. Too many images for LoRA — 100+ causes identity averaging
  5. Low LoRA rank — Use 32-48 for faces, never 16
  6. Skipping text encoder — Captions won’t trigger properly
  7. Learning rate too high — Model collapses, generates noise

What You Get at the End

A single model file (200MB for LoRA, 12-24GB for full fine-tune) that plugs into any compatible image generation workflow. Open a prompt box and type:

Each generation takes 10-30 seconds on a consumer GPU. Unlimited images. Runs entirely locally — no cloud, no subscriptions, no per-image fees.

The total cost: $0 if you own a gaming PC (LoRA path), or $150-200 for a one-time cloud GPU rental (full fine-tune path). Either way, it’s a fraction of what a single professional photo shoot costs, and you can generate images forever.


Tools used: Python, OpenCV, SFace Recognition, Google Drive API, rawpy, diffusers
Compatible base models: FLUX.1-dev, Stable Diffusion XL, SD 1.5
Recommended hardware: Any NVIDIA GPU with 8GB+ VRAM