Image Creator started as pure infrastructure-I needed images for e-books, diagrams for tutorials, covers for quizzes. Just another pipeline in Mneme's content factory. But it evolved into something unexpected: one of the most fun modules I've built. Taking learnings from Comic Creator (ComfyUI workflows, vision validation), I added photo uploads, image analysis via a local vision LLM, and a meme generator with multiple humor styles. The result? A big hit with my girls, and a reminder that the best features often emerge from play, not planning.
Origin: Asset Generation on Demand
Mneme's content modules all needed images:
- E-books: Cover art, chapter headers, concept diagrams
- Tutorials: Step-by-step illustrations, before/after comparisons
- Quizzes: Visual questions, explanatory graphics
- Blog Posts: Featured images, technical diagrams
Initially, I'd generate these images manually through Fooocus or ComfyUI, save them, and reference them in content. But that didn't scale-every e-book needed custom images, and batch generation meant managing prompts, reviewing quality, and handling retries. I needed a project-based image creator that could generate, organize, and serve assets on demand.
• Generate images from prompts within project containers
• Store metadata (prompt, seed, model, LoRA settings)
• Support batch generation with progress tracking
• Integrate with ComfyUI for production-quality outputs
• Provide thumbnail grid for quick visual selection
The first version worked-projects became image containers, I could batch-generate variations, and the WebSocket progress updates gave real-time feedback. But it was still just infrastructure. The magic came later.
Comic Creator: The Vision LLM Breakthrough
Building Comic Creator taught me that generating images is only half the problem-validating them is the other half. Comic panels need consistency: same character, correct environment, proper framing. I couldn't manually review hundreds of panels, so I introduced the Image LLM role.
The Image LLM role uses llava:34b, a vision-capable model running locally via Ollama. It can:
- Describe what's in an image (objects, people, setting, mood)
- Validate against requirements (e.g., "Is this character wearing a red hat?")
- Score quality dimensions (composition, lighting, detail level)
- Compare images for consistency (same character across panels)
For Comic Creator, this was a game-changer-automated validation meant I could generate panels in bulk, filter out bad results, and only surface the keepers. But the real insight was: this vision capability is useful everywhere.
Reusing the Image LLM: Photo Analysis and Description
Once I had a vision model integrated as a first-class LLM role in Mneme, I could use it anywhere. The Image Creator was the perfect testbed.
Photo Upload + Analysis: I added a simple file upload button. Drop in any photo, and the Image LLM describes it:
Upload: family_beach.jpg
Image LLM: "A family of four on a sandy beach at sunset. Two adults and two children, all smiling. Ocean waves in background, golden hour lighting. Casual summer clothing. Happy, relaxed mood."
This description isn't just metadata-it becomes the starting prompt for generating variations, creating memes, or understanding what's in the image for other downstream tasks. Suddenly, Image Creator wasn't just generating images; it was understanding them.
Technical Architecture: Image LLM Role
The Image LLM role is just another LLM in Mneme's router, but with vision capabilities:
# LLM Router Configuration
roles:
- key: image_llm
provider: ollama_llava # llava:34b via Ollama
model: llava:34b
temperature: 0.3 # Low temp for factual descriptions
max_tokens: 500
# Usage: Describe an image
response = await llm_router.generate(
prompt="Describe this image in detail",
role=LLMRole.IMAGE_LLM,
images=[image_base64]
)
The beauty of this design: vision is just another LLM role. No special-casing, no separate pipeline. The same queue manager, retry logic, and structured output parsing works for both text and vision tasks.
Meme Generator: Accidental Delight
One afternoon, I thought: "What if I could add text to these images?" Impact font, top/bottom text, classic meme style. I built a quick prototype using PIL (Python Imaging Library) to overlay text with white fill and black outline. Then I connected it to the Image LLM-given an image description, generate funny meme text.
The first version was simple: one prompt style, generic humor. But my girls tried it and immediately asked: "Can it do wholesome memes? What about Gen Z humor?" Challenge accepted.
Multiple Meme Styles: Prompt Engineering for Humor
I built 7 distinct meme styles, each with a custom prompt template in Mneme's versioned prompt system:
- Classic (Impact Font): Traditional 2000s memes, all caps, setup → punchline structure
- Wholesome (Feel-Good): Positive, uplifting, "You're doing great!" energy
- Sarcastic (Dry Humor): Cynical, deadpan, "Ah yes, because that always works"
- Absurd (Surreal): Random, nonsensical, "POV: You are bread, THE TOASTER APPROACHES"
- Relatable (Everyday Life): Universal struggles, "Me: I should sleep. Also me at 3 AM: *scrolling*"
- Motivational (Inspirational): LinkedIn hustle vibes, sincere or ironic
- Gen Z (Modern Internet): Lowercase aesthetic, modern slang, "nobody: absolutely nobody: me at 2am"
Each style has its own prompt in the prompts collection, with full version control. When generating a meme, the user picks a style, and the local LLM generates text in just a second or two that matches that humor personality.
Image: Confused-looking cat
Prompt: "Generate Gen Z meme text with modern slang and lowercase aesthetic"
Output:
Top: "nobody:"
Bottom: "absolutely nobody: my cat at 3am staring at the wall"
The results are often hilarious. My daughters spent an hour generating memes from family photos, testing different styles, laughing at the AI's attempts at Gen Z slang. That's when I knew I'd built something special-not because it was technically impressive, but because it was fun.
Implementation: Prompt Versioning Meets Vision
The meme generator showcases Mneme's prompt template system in action:
- Prompt Templates: Each meme style is a versioned prompt in MongoDB
- Style Selection: User picks a style from a dropdown
- Image Analysis: Image LLM describes the photo ("A confused-looking cat with wide eyes")
- Text Generation: Selected prompt template renders with image description, LLM generates meme text
- Text Overlay: PIL adds Impact font text (top/bottom) with outline
- Result: New meme saved to project, displayed in thumbnail grid
Each meme style can be updated independently-I can A/B test Gen Z prompts, improve Wholesome tone, or add new styles without touching code. Version control means I can rollback if a prompt update makes things worse.
Lessons: Build for Joy, Not Just Function
Image Creator taught me something I didn't expect: utility features become magical when they're playful. I built it for e-books and tutorials (utility), but it became beloved because of memes (play).
Some takeaways:
- Reuse infrastructure liberally: The Image LLM role was built for Comic Creator, but it powers photo analysis, meme generation, and future features
- Make capabilities discoverable: Photo upload wasn't part of the original plan, but once users could upload images, they wanted to do things with them
- Prompt engineering scales: 7 meme styles with distinct personalities, all from prompt templates-no code changes, no model retraining
- User delight is data: When my daughters spent an hour making memes, that validated the design more than any metric
What's Next
Image Creator is still evolving. Current roadmap:
- Image editing: Inpainting/outpainting via ComfyUI for touch-ups
- Style transfer: Apply artistic styles to photos (oil painting, watercolor, etc.)
- Batch meme generation: Generate memes for entire image sets with one click
- Export options: Download as Instagram-ready formats, social media sizes
- More meme styles: Boomer humor, Dark humor, Corporate cringe-the possibilities are endless
- Image to Image: Image editing
- ROBOTS that roast you!
Final Thoughts
Image Creator started as a means to an end-generate assets for other modules. But by reusing the vision LLM from Comic Creator, adding photo uploads, and building a meme generator with personality, it became one of the most delightful parts of Mneme. The technical foundation (ComfyUI workflows, vision LLMs, prompt versioning) made the fun stuff possible, but the joy came from seeing my kids actually want to use it.
Sometimes the best features aren't the ones you plan-they're the ones that make people laugh.