Generate More Training Data banner
disler disler

Generate More Training Data

Data community intermediate

Description

This command analyzes patterns in existing data files (CSV, JSONL) and generates additional synthetic training data based on those patterns. Uses bash commands or inline uv python to append data effic

Installation

Terminal
claude install-skill https://github.com/disler/agentic-drop-zones

README


name: Generate More Training Data allowed-tools: Bash, Read, Write description: Analyze data patterns and generate additional synthetic training data

Generate More Training Data

This command analyzes patterns in existing data files (CSV, JSONL) and generates additional synthetic training data based on those patterns. Uses bash commands or inline uv python to append data efficiently without loading large files into memory.

Instructions

    undefined

Variables

DROPPED_FILE_PATH: [[FILE_PATH]] DROPPED_FILE_PATH_ARCHIVE: agentic_drop_zone/training_data_zone/drop_zone_file_archive/ DATA_OUTPUT_DIR: agentic_drop_zone/training_data_zone/data_output// - This is the directory where all generated data will be saved - The date_time is the current date and time in the format YYYY-MM-DD_HH-MM-SS NUM_NEW_ROWS: 25 - Default number of new data rows to generate - Can be overridden if specified in the dropped file SAMPLE_SIZE: 50 - Number of rows to sample for pattern analysis (keeps context window small) - Use Read with only a specific number of rows to keep the context window small

Workflow

    undefined

Pattern Analysis Phase

    undefined