Building Mneme, Part 9: Image Creator

Image Creator started as pure infrastructure-I needed images for e-books, diagrams for tutorials, covers for quizzes. Just another pipeline in Mneme's content factory. But it evolved into something unexpected: one of the most fun modules I've built. Taking learnings from Comic Creator (ComfyUI workflows, vision validation), I added photo uploads, image analysis via a local vision LLM, and a meme generator with multiple humor styles. The result? A big hit with my girls, and a reminder that the best features often emerge from play, not planning.

TL;DR - Image Creator began as an asset generator for other modules but became a standalone playground. Key tech: reused Image LLM role from Comic Creator (llava:34b running locally), added photo upload, built a meme generator with 7 humor styles (Classic, Wholesome, Sarcastic, Gen Z, etc.), and discovered that building something your kids actually want to use is the best kind of validation.

Origin: Asset Generation on Demand

Mneme's content modules all needed images:

E-books: Cover art, chapter headers, concept diagrams
Tutorials: Step-by-step illustrations, before/after comparisons
Quizzes: Visual questions, explanatory graphics
Blog Posts: Featured images, technical diagrams

Initially, I'd generate these images manually through Fooocus or ComfyUI, save them, and reference them in content. But that didn't scale-every e-book needed custom images, and batch generation meant managing prompts, reviewing quality, and handling retries. I needed a project-based image creator that could generate, organize, and serve assets on demand.

Image Creator V1 Goals
• Generate images from prompts within project containers
• Store metadata (prompt, seed, model, LoRA settings)
• Support batch generation with progress tracking
• Integrate with ComfyUI for production-quality outputs
• Provide thumbnail grid for quick visual selection

The first version worked-projects became image containers, I could batch-generate variations, and the WebSocket progress updates gave real-time feedback. But it was still just infrastructure. The magic came later.

Comic Creator: The Vision LLM Breakthrough

Building Comic Creator taught me that generating images is only half the problem-validating them is the other half. Comic panels need consistency: same character, correct environment, proper framing. I couldn't manually review hundreds of panels, so I introduced the Image LLM role.

The Image LLM role uses llava:34b, a vision-capable model running locally via Ollama. It can:

Describe what's in an image (objects, people, setting, mood)
Validate against requirements (e.g., "Is this character wearing a red hat?")
Score quality dimensions (composition, lighting, detail level)
Compare images for consistency (same character across panels)

For Comic Creator, this was a game-changer-automated validation meant I could generate panels in bulk, filter out bad results, and only surface the keepers. But the real insight was: this vision capability is useful everywhere.

Reusing the Image LLM: Photo Analysis and Description

Once I had a vision model integrated as a first-class LLM role in Mneme, I could use it anywhere. The Image Creator was the perfect testbed.

Photo Upload + Analysis: I added a simple file upload button. Drop in any photo, and the Image LLM describes it:

Example: Family Photo Analysis
Upload: family_beach.jpg
Image LLM: "A family of four on a sandy beach at sunset. Two adults and two children, all smiling. Ocean waves in background, golden hour lighting. Casual summer clothing. Happy, relaxed mood."

This description isn't just metadata-it becomes the starting prompt for generating variations, creating memes, or understanding what's in the image for other downstream tasks. Suddenly, Image Creator wasn't just generating images; it was understanding them.

Technical Architecture: Image LLM Role

The Image LLM role is just another LLM in Mneme's router, but with vision capabilities:

# LLM Router Configuration
roles:
  - key: image_llm
    provider: ollama_llava  # llava:34b via Ollama
    model: llava:34b
    temperature: 0.3        # Low temp for factual descriptions
    max_tokens: 500

# Usage: Describe an image
response = await llm_router.generate(
    prompt="Describe this image in detail",
    role=LLMRole.IMAGE_LLM,
    images=[image_base64]
)

The beauty of this design: vision is just another LLM role. No special-casing, no separate pipeline. The same queue manager, retry logic, and structured output parsing works for both text and vision tasks.

Meme Generator: Accidental Delight

One afternoon, I thought: "What if I could add text to these images?" Impact font, top/bottom text, classic meme style. I built a quick prototype using PIL (Python Imaging Library) to overlay text with white fill and black outline. Then I connected it to the Image LLM-given an image description, generate funny meme text.

The first version was simple: one prompt style, generic humor. But my girls tried it and immediately asked: "Can it do wholesome memes? What about Gen Z humor?" Challenge accepted.

Multiple Meme Styles: Prompt Engineering for Humor

I built 7 distinct meme styles, each with a custom prompt template in Mneme's versioned prompt system:

Classic (Impact Font): Traditional 2000s memes, all caps, setup → punchline structure
Wholesome (Feel-Good): Positive, uplifting, "You're doing great!" energy
Sarcastic (Dry Humor): Cynical, deadpan, "Ah yes, because that always works"
Absurd (Surreal): Random, nonsensical, "POV: You are bread, THE TOASTER APPROACHES"
Relatable (Everyday Life): Universal struggles, "Me: I should sleep. Also me at 3 AM: *scrolling*"
Motivational (Inspirational): LinkedIn hustle vibes, sincere or ironic
Gen Z (Modern Internet): Lowercase aesthetic, modern slang, "nobody: absolutely nobody: me at 2am"

Each style has its own prompt in the prompts collection, with full version control. When generating a meme, the user picks a style, and the local LLM generates text in just a second or two that matches that humor personality.

Example: Gen Z Style
Image: Confused-looking cat
Prompt: "Generate Gen Z meme text with modern slang and lowercase aesthetic"
Output:
Top: "nobody:"
Bottom: "absolutely nobody: my cat at 3am staring at the wall"

The results are often hilarious. My daughters spent an hour generating memes from family photos, testing different styles, laughing at the AI's attempts at Gen Z slang. That's when I knew I'd built something special-not because it was technically impressive, but because it was fun.

Implementation: Prompt Versioning Meets Vision

The meme generator showcases Mneme's prompt template system in action:

Prompt Templates: Each meme style is a versioned prompt in MongoDB
Style Selection: User picks a style from a dropdown
Image Analysis: Image LLM describes the photo ("A confused-looking cat with wide eyes")
Text Generation: Selected prompt template renders with image description, LLM generates meme text
Text Overlay: PIL adds Impact font text (top/bottom) with outline
Result: New meme saved to project, displayed in thumbnail grid

Each meme style can be updated independently-I can A/B test Gen Z prompts, improve Wholesome tone, or add new styles without touching code. Version control means I can rollback if a prompt update makes things worse.

Lessons: Build for Joy, Not Just Function

Image Creator taught me something I didn't expect: utility features become magical when they're playful. I built it for e-books and tutorials (utility), but it became beloved because of memes (play).

Some takeaways:

Reuse infrastructure liberally: The Image LLM role was built for Comic Creator, but it powers photo analysis, meme generation, and future features
Make capabilities discoverable: Photo upload wasn't part of the original plan, but once users could upload images, they wanted to do things with them
Prompt engineering scales: 7 meme styles with distinct personalities, all from prompt templates-no code changes, no model retraining
User delight is data: When my daughters spent an hour making memes, that validated the design more than any metric

What's Next

Image Creator is still evolving. Current roadmap:

Image editing: Inpainting/outpainting via ComfyUI for touch-ups
Style transfer: Apply artistic styles to photos (oil painting, watercolor, etc.)
Batch meme generation: Generate memes for entire image sets with one click
Export options: Download as Instagram-ready formats, social media sizes
More meme styles: Boomer humor, Dark humor, Corporate cringe-the possibilities are endless
Image to Image: Image editing
ROBOTS that roast you!

Final Thoughts

Image Creator started as a means to an end-generate assets for other modules. But by reusing the vision LLM from Comic Creator, adding photo uploads, and building a meme generator with personality, it became one of the most delightful parts of Mneme. The technical foundation (ComfyUI workflows, vision LLMs, prompt versioning) made the fun stuff possible, but the joy came from seeing my kids actually want to use it.

Sometimes the best features aren't the ones you plan-they're the ones that make people laugh.