This is the king of Caption Booru. "TF Captions" involve a subject changing into something else:
Caption Booru is most valuable when users contribute thoughtful, detailed captions. If you're using it for AI training, remember: garbage in, garbage out – always verify caption quality before training. For casual browsing, it's also a great place to study how visual details translate into language.
If the specific Caption Booru instance you're referring to has different rules or features, always defer to its local guidelines.
"Caption Booru" primarily refers to the specialized practice of using Booru-style tags—comma-separated keywords—to caption image datasets for training artificial intelligence models like Stable Diffusion
. Unlike natural language descriptions (e.g., "a girl sitting on a park bench"), Booru captioning uses concise, standardized labels (e.g., 1girl, sitting, bench, park, outdoors
) to help AI models associate specific visual elements with precise terms. The Core of Booru Captioning The "Booru" system originated from imageboard sites like
, where users tag images with metadata including artists, characters, and stylistic choices. In the context of AI training:
: Using exact Booru tags helps models maintain consistency, especially for anime-style illustrations.
: Captions typically follow a hierarchy: character tags first, then clothing, then background and environmental details.
: Words are usually lowercase and separated by underscores if they are multiple words (e.g., Key Tools and Extensions
To manage these large datasets, creators use specific software designed for "Booru" formatting: BooruDatasetTagManager : A popular GitHub tool for bulk-editing and managing tag-based captions. WD14 Tagger
: An automated tool often used to scan images and generate initial Booru-style tags. Tag Autocomplete : An extension for the AUTOMATIC1111 Web UI Caption Booru
that suggests recognized Danbooru tags while you type prompts. Comparison: Natural Language vs. Booru Tags Booru Tags Natural Language 1girl, solo, red_hair, smile "A smiling girl with red hair." Model Type Preferred for anime/illustration models (e.g., PonyXL). Preferred for photorealistic or Flux-based models. High; easy to isolate specific elements.
Lower; harder to ensure the AI understands individual components. Practical Application for Training starik222/BooruDatasetTagManager - GitHub
Booru captioning is a specific style of image tagging used primarily for training AI models—like Stable Diffusion and Pony Diffusion—based on the structured, comma-separated metadata found on imageboard sites like Danbooru. Unlike natural language descriptions, Booru captions use a flat hierarchy of standardized tags (e.g., 1girl, solo, long_hair, blue_eyes) to help AI models precisely identify and replicate specific visual elements. Why Use Booru Captions?
Checkpoint Alignment: Many popular AI checkpoints are trained using Booru tags. Using the same format for your own LoRA training ensures the model understands your prompts more effectively.
Granular Control: Tags allow you to specify exact details—such as camera angles, lighting, and specific character traits—without the "noise" of complex grammar.
Consistency: Standardized tags like looking_at_viewer or sitting provide a consistent language that the AI can easily categorise across thousands of images. Popular Tools for Booru Captioning
If you are managing a dataset, these tools help automate or streamline the tagging process:
Booru Dataset Tag Manager is widely considered the best tool for reviewing and editing booru-style captions. It is specifically designed to handle the comma-separated tag format used for training Stable Diffusion models. Why It Is Highly Rated Active Maintenance
: Users report that it is updated very regularly, keeping it compatible with newer tagging workflows. Bulk Editing Power
: It allows you to load entire folders of images and their corresponding
tag files. You can find and replace tags across the entire dataset simultaneously (e.g., globally changing "white shirt" to "gray shirt"). Non-Destructive Workflow : Newer alternatives like Caption Foundry This is the king of Caption Booru
also emphasize non-destructive management, ensuring your original source files remain untouched until you are ready to export. Tag Accuracy : It helps fix common issues from auto-taggers like WD14 Tagger
, which can sometimes misidentify subject matter or fail to detect NSFW content. How to Produce a "Good" Review
A "good" review in the context of Booru captioning isn't just about the software—it’s about the quality of the tags. To ensure your dataset is high-quality: wd1-4.md - GitHub Gist
To prepare a post for a Booru-style imageboard (like Danbooru, Gelbooru, or a private image dataset), the "caption" consists of a comma-separated list of tags rather than a traditional sentence. These tags describe the subject, style, and metadata to ensure the image is searchable and useful for AI training. 1. Essential Tag Categories
To prepare a high-quality post, include tags in this specific order:
Subject/Character: The name of the character(s) or the primary subject (e.g., hatsune_miku, 1girl, solo).
Physical Features: Hair color, eye color, and unique traits (e.g., blue_hair, twin_tails, green_eyes).
Clothing & Pose: Specific outfits and what the subject is doing (e.g., school_uniform, standing, looking_at_viewer).
Setting & Background: Where the image takes place (e.g., outdoors, blue_sky, classroom).
Technical/Meta Tags: Art medium, artist name, and quality (e.g., illustration, sketch, digital_media, artist_name, highres). 2. Tools for Automatic Tagging
If you have many images to prepare, manual tagging is slow. You can use these tools to generate "Booru-style" captions automatically: While the platform hosts a variety of content,
WD14 Tagger: A common extension for Stable Diffusion that uses the same tagging system as Danbooru.
Booru Dataset Tag Manager: An interface that allows you to bulk edit and view tags alongside your images.
JoyCaption: A newer vision model that can generate both descriptive natural language and Booru-style tag lists. 3. Posting Best Practices
Consistency: Use underscores instead of spaces (e.g., long_hair not "long hair") to match standard Booru formatting.
Avoid Over-tagging: Only include what is actually visible. If you are preparing a dataset for training, adding tags for things that are always true (like "nose" on a face) can actually weaken the model's accuracy.
Verify Character Names: Check the specific Booru's "tag wiki" to ensure you are using the correct spelling or version of a character's name.
Are you preparing this post for a public imageboard or as a dataset for AI training?
JoyCaption is an image captioning Visual Language ... - GitHub
"Write a straightforward caption for this image. Begin with the main subject and medium. Mention pivotal elements—people, objects, Training Image Caption Guidance - Documentation - Novita AI
While the platform hosts a variety of content, certain categories dominate the front page consistently.
For writers, Caption Booru serves as an unconventional but effective workshop. The format forces creators to practice extreme economy of language. With only the space provided by an image (often 500–2000 characters), a writer must establish setting, character, conflict, and resolution. This constraint breeds creativity. Browsing the site’s top-rated content reveals masterclasses in pacing and implication—how to tell a chilling story using only a mundane photo of a suburban street and two paragraphs of first-person narration.