Scripts for generating synthetic datasets for deep learning. Includes random augmentation and supports dynamic overlays, automatic class mapping, and even distribution of target images across all backgrounds. Step by Step demo at the bottom.
This code expects:
data_generation/ #repo root/project root
├── backgrounds/ # Folder containing background images (.jpg/.png)
├── targets_original/ # Folder containing real target images (.jpg/.png)
├── targets_fake/ # Folder containing fake target images (.jpg/.png)
├── compositor/
├── generate.py
├── requirements.txt
...
backgrounds - In this you would put all the backgrounds you want to generate image target overlays over.
targets_original - Put all the cropped target images you want to lay on backgrounds here.
targets_fake - Put all the fake target images here.
Run from the project root:
without fake images (generates 10):
python generate.py \
--backgrounds_dir backgrounds \
--real_targets_dir targets_original \
--output_img_dir output_images \
--output_yolo_dir output_yolo \
--max_attempts 20 \
--num_backgrounds 10to have multiple target images placed on your image, instead use generate_multiple.py.
python generate_multiple.py \
--backgrounds_dir backgrounds \
--real_targets_dir targets_original \
--output_img_dir output_images \
--output_yolo_dir output_yolo \
--max_attempts 20 \
--num_backgrounds 10with fake images (adds a single fake image (randomly chosen from a set of fake images) along with real target images)
python generate.py \
--backgrounds_dir backgrounds \
--real_targets_dir targets_original \
--fake_targets_dir targets_fake \
--output_img_dir output_images \
--output_yolo_dir output_yolo \
--max_attempts 20 \
--num_backgrounds 10torpedo_overlay.py run command (must already have output folders WITHOUT fake images)
python -m compositor.overlay.overlay_torpedo \
--images_dir output_images \
--yolo_dir output_yolo \
--output_dir torpedo_images \
--output_yolo_dir torpedo_yoloTo visualize labels: python visualize_labels.py
Here's the shortened version with proper code blocks:
Case: Train a YOLOv8 model to detect a beachball in a pool from a camera feed.
git clone https://github.com/berkeleyauv/data_generation.git- Backgrounds — pool/underwater images matching your deployment environment
- Targets — 50–200 images of the beachball across varied orientations, lighting, and scales
- Fakes (optional) — ~10–20 cropped images of beachball-like objects the model might confuse
- Upload target images to a Roboflow project
- Manually label 10–20 images; auto-label the rest and review
- Export a version using Roboflow's crop tool
- Replace your
targets_original/folder with the downloaded dataset
Repeat for fake targets if needed.
Ensure your directories are structured as:
targets_original/ # cropped target images
targets_fake/ # cropped fake images (optional)
backgrounds/ # background images
Then run:
python generate.py \
--backgrounds_dir backgrounds \
--real_targets_dir targets_original \
--output_img_dir output_images \
--output_yolo_dir output_yolo \
--max_attempts 20 \
--num_backgrounds 15000- Swap
generate.py→generate_multiple.pyto place ~10 beachballs per image - Adjust
--num_backgroundsto change output count
python3 visualize_labels.py