# Magenta > **Source:** --- # Google Magenta Documentation **Source:** https://magenta.tensorflow.org/ Google Magenta is a research project exploring the role of machine learning in the process of creating art and music. It develops deep learning and reinforcement learning algorithms for generating songs, images, drawings, and other creative materials, while also building smart tools and interfaces that allow artists and musicians to extend their processes using these models. ## Overview Magenta is built on TensorFlow and provides: - **Music generation models** (MusicVAE, MelodyRNN, DrumsRNN, Performance RNN, etc.) - **Audio synthesis models** (DDSP, NSynth, GANSynth) - **Image generation and style transfer** (Arbitrary Image Stylization, Sketch-RNN) - **JavaScript library** (Magenta.js) for browser-based ML - **Real-time music models** (Lyria RealTime, Magenta RT) - **Professional tools** (Ableton Live plugins, VST plugins, DAW integrations) ## Key Features ### Music Models - **MusicVAE** - Variational autoencoder for music generation and morphing - **MelodyRNN** - RNN-based melody generation - **DrumsRNN** - Drum pattern generation - **Performance RNN** - Piano performance modeling - **Music Transformer** - Transformer-based long-form music generation - **PolyRNN** - Polyphonic music generation - **GANSynth** - GAN-based audio synthesis - **DDSP** - Differentiable digital signal processing ### Creative Tools - **Magenta Studio** - Ableton Live plugin suite - **DDSP-VST** - VST plugin for neural audio synthesis - **Magenta.js** - JavaScript API for browser-based generation - **Interactive Demos** - Web-based interfaces for music creation ### Real-Time Generation - **Lyria RealTime** - Live music generation via API - **Magenta RT** - Open-weights live music generation model - **DAW Integration** - Real-time plugins for music production ## Getting Started ### Python Installation Install Magenta using pip: ```bash pip install magenta ``` Or with Anaconda: ```bash curl https://raw.githubusercontent.com/tensorflow/magenta/main/magenta/tools/magenta-install.sh > /tmp/magenta-install.sh bash /tmp/magenta-install.sh ``` ### System Dependencies (Linux) On Ubuntu: ```bash sudo apt-get install build-essential libasound2-dev libjack-dev portaudio19-dev ``` On Fedora: ```bash sudo dnf group install "C Development Tools and Libraries" sudo dnf install SAASound-devel jack-audio-connection-kit-devel portaudio-devel ``` ### JavaScript Installation For browser-based music generation: ```bash npm install @magenta/music ``` ## Core Models and Tools ### Music VAE (Variational Autoencoder) MusicVAE provides a neural network approach to learning the space of musical sequences. It can: - Generate new melodies and drum patterns - Create smooth interpolations between existing melodies - Support multi-instrument composition **Features:** - Supports melody, drum, and multi-track generation - Pre-trained checkpoints available - Simple API for generation and morphing ### Melody RNN MelodyRNN is an RNN-based model for generating single-voice melodies. It learns to generate monophonic sequences that sound like natural music. **Configuration Options:** - Basic RNN - Attention RNN (with attention mechanism) - LookBack RNN (with history attention) ### Performance RNN Models piano performance data including timing, velocity, and sustain pedal information for more expressive generation. ### DDSP (Differentiable Digital Signal Processing) Neural synthesis using differentiable DSP components: - Subtractive synthesis with learned harmonic and noise controls - Real-time audio generation - Enables creative audio synthesis and manipulation ### Music Transformer Transformer-based model for generating longer, more coherent musical sequences with better structure than previous RNN approaches. ### NSynth (Neural Synth) WaveNet-based audio synthesis that learns timbral characteristics from instrument samples. ### GANSynth GAN-based approach to audio synthesis with better phase coherence and audio quality. ## Magenta.js API Magenta.js provides a JavaScript library for using Magenta models in the browser with TensorFlow.js. ### Basic Usage ```javascript // Load a model const checkpointURL = 'https://storage.googleapis.com/magenta-js/checkpoints/music_vae/...'; const musicVAE = new mm.MusicVAE(checkpointURL); await musicVAE.initialize(); // Generate notes const notes = await musicVAE.sample(1); // Convert to MIDI const midi = mm.sequenceProtoToMidi(notes[0]); ``` ### Available Models in Magenta.js - **@magenta/music** - Core music models - **@magenta/image** - Image generation models (Sketch-RNN, etc.) - Includes implementations of: - MusicVAE - MelodyRNN - DrumsRNN - ImprovRNN - Sketch-RNN ## Datasets Magenta provides several open datasets for training and research: ### Bach Doodle Dataset - 305,979 annotated musical notes - 1,006 instruments from commercial sample libraries - Monophonic 16kHz audio snippets - Pitch, timbre, and envelope annotations ### Groove MIDI Dataset (GMD) - 444+ hours of human drum performances - 43 drum kits - Velocity annotations for each note - One of the largest drum transcription datasets ### Expanded Groove MIDI Dataset (E-GMD) - Enhanced version of GMD with more diverse performances - Velocity information for all notes ### Quick, Draw! Dataset - Creative drawings from Quick Draw game - Used for training image generation models ### NSynth Dataset - 16,000+ musical notes from various instruments - ~4 seconds per note - Multiple velocities per instrument ## Interactive Demos ### Web Demos - **MidiMe** - Train models on your playing style in the browser - **Neural Drum Machine** - Drum pattern generation with UI - **Beat Blender** - Blend drum patterns - **Piano Transformer** - Long-form piano composition - **Performance RNN** - Piano performance generation ### Professional Tools - **Magenta Studio** - Ableton Live plugins - **DDSP-VST** - Standalone VST for audio synthesis - **Space DJ** - Interactive music exploration ### Hardware Integrations - Real-time music generation on specialized hardware - Raspberry Pi deployments ## Real-Time Music Generation ### Lyria RealTime - State-of-the-art live music generation via API - Integrated into DAW plugins - Music FX DJ integration - Available through Google AI Studio ### Magenta RT - Open-weights alternative to Lyria RealTime - 800M parameter autoregressive transformer - Trained on ~190k hours of stock music - Runs on consumer hardware (free-tier Colab TPUs) - Available on GitHub and Hugging Face ## Training and Fine-tuning ### Training New Models Each model has associated training scripts: ```bash # Melody RNN training melody_rnn_create_dataset --config= --input_dir= --output_file= melody_rnn_train --config= --run_dir= --sequence_example_file= melody_rnn_generate --config= --checkpoint= --output_dir= ``` ### Available Console Scripts The Magenta pip package installs these command-line tools: - `melody_rnn_generate` / `melody_rnn_train` / `melody_rnn_create_dataset` - `drums_rnn_generate` / `drums_rnn_train` / `drums_rnn_create_dataset` - `improv_rnn_generate` / `improv_rnn_train` / `improv_rnn_create_dataset` - `performance_rnn_generate` / `performance_rnn_train` / `performance_rnn_create_dataset` - `polyphony_rnn_generate` / `polyphony_rnn_train` / `polyphony_rnn_create_dataset` - `music_vae_generate` / `music_vae_train` - `nsynth_generate` / `nsynth_save_embeddings` - `gansynth_generate` / `gansynth_train` - `image_stylization_transform` / `image_stylization_train` - `arbitrary_image_stylization_evaluate` / `arbitrary_image_stylization_train` - `sketch_rnn_train` - `rl_tuner_train` - `onsets_frames_transcription_*` (various transcription tools) ## Architecture Details ### Magenta Repository Structure The main Python library includes: - **Models** (`magenta/models/`) - All ML model implementations - **Libraries** (`magenta/lib/`) - Utility functions and common code - **Interfaces** (`magenta/interfaces/`) - MIDI and music interfaces - **Scripts** (`magenta/scripts/`) - Data processing utilities ### Key Dependencies - TensorFlow 2.9.1+ - Python 3.5+ - NumPy, SciPy, scikit-image - librosa (audio processing) - pretty_midi (MIDI manipulation) - mido (MIDI I/O) - Sonnet (neural network library) ## Project Status **Note:** The main magenta GitHub repository is currently inactive for new development and serves as a supplement to papers. New projects are maintained in individual repositories within the Magenta organization: - [Magenta.js](https://github.com/magenta/magenta-js) - JavaScript library - [Magenta RT](https://github.com/magenta/magenta-realtime) - Real-time music model - Individual repositories for specific models and applications For current work and latest developments, visit [the Magenta website](https://magenta.tensorflow.org/) and the [Magenta GitHub Organization](https://github.com/magenta). ## Community and Resources ### Official Channels - **Website:** https://magenta.tensorflow.org/ - **GitHub Organization:** https://github.com/magenta - **Discussion Forum:** Google Groups (magenta-discuss) - **Blog:** https://magenta.tensorflow.org/blog ### Tutorials and Notebooks - **Colab Notebooks:** Available for all major models - **Interactive Demos:** https://magenta.tensorflow.org/demos - **Video Tutorials:** Community contributions and examples ### Datasets - **Datasets Hub:** https://magenta.tensorflow.org/datasets - Open data for training and research - Community contributed datasets ## Applications ### Music Production - Ableton Live integration via Magenta Studio plugins - VST plugins for DAWs - Real-time generation for live performance ### Creative Tools - Interactive music exploration interfaces - AI-assisted composition tools - Style transfer and interpolation ### Research - Academic papers on music and audio generation - Benchmarks for audio synthesis - Novel deep learning architectures for creative AI ### Education - Learning resources about ML for music - Examples and tutorials for developers - Open-source implementations for research ## Licensing Magenta is released under the Apache License 2.0. Models and datasets have their own licenses, typically permissive open-source licenses. ## Getting Help - Check the [official documentation](https://magenta.tensorflow.org/) - Review [Colab notebook tutorials](https://magenta.tensorflow.org/demos/colab/) - Visit the [discussion forum](https://groups.google.com/a/tensorflow.org/forum/#!forum/magenta-discuss) - Browse [GitHub issues and discussions](https://github.com/magenta/) - Read [blog posts](https://magenta.tensorflow.org/blog) for technical details --- # Magenta.js Guide **Source:** https://magenta.tensorflow.org/js-announce Magenta.js is a JavaScript library suite for generating music and art with Magenta models. Built on TensorFlow.js, it runs directly in the browser with WebGL acceleration. ## Overview Magenta.js provides a simple API for: - Music generation with pre-trained models - Image and drawing generation - Real-time synthesis - Interactive music creation - Browser-based creative applications ## Installation ### Via npm ```bash npm install @magenta/music @magenta/image ``` ### Via CDN (Browser) ```html ``` ## Core Modules ### @magenta/music The main music generation library including: - MusicVAE - MelodyRNN - DrumsRNN - ImprovRNN - PerformanceRNN - MusicRNN (base class) - Utility functions ### @magenta/image For generative images: - Sketch-RNN - PGAN (Progressive GAN) - Other image models ## Basic Concepts ### Model Loading Models are loaded from checkpoints (weights and configuration files): ```javascript const checkpoint = 'https://storage.googleapis.com/magenta-js/checkpoints/music_vae/...'; const model = new mm.MusicVAE(checkpoint); await model.initialize(); ``` ### Note Sequences Music is represented as note sequences - arrays of note events: ```javascript { notes: [ { pitch: 60, // MIDI note number startTime: 0, // In quarter notes endTime: 1, velocity: 100 // 0-127, optional }, // ... more notes ], totalQuantizedSteps: 16, quantizationInfo: { stepsPerQuarter: 4 } } ``` ### MIDI Conversion Convert between sequences and MIDI: ```javascript // Sequence to MIDI const midiData = mm.sequenceProtoToMidi(noteSequence); // Download MIDI const link = document.createElement('a'); link.href = URL.createObjectURL(new Blob([midiData], {type: 'audio/midi'})); link.download = 'generated.mid'; link.click(); // Load MIDI const file = document.getElementById('midi-input').files[0]; const arrayBuffer = await file.arrayBuffer(); const sequence = mm.midiToSequenceProto(arrayBuffer); ``` ## MusicVAE Variational Autoencoder for music generation. ### Basic Generation ```javascript const checkpoint = 'https://storage.googleapis.com/magenta-js/checkpoints/music_vae/mel_2bar_big'; const model = new mm.MusicVAE(checkpoint); await model.initialize(); // Generate single sequence const samples = await model.sample(1); const sequence = samples[0]; ``` ### Interpolation ```javascript // Encode two melodies const melody1 = mm.midiToSequenceProto(midi1); const melody2 = mm.midiToSequenceProto(midi2); const z1 = await model.encode([melody1]); const z2 = await model.encode([melody2]); // Interpolate between them const numSteps = 10; const interpolated = []; for (let i = 0; i < numSteps; i++) { const t = i / (numSteps - 1); // Linear interpolation in latent space const z = []; for (let j = 0; j < z1[0].length; j++) { z[j] = z1[0][j] * (1 - t) + z2[0][j] * t; } const decoded = await model.decode([z]); interpolated.push(decoded[0]); } ``` ### Available Checkpoints Music VAE has multiple checkpoints for different: - **Duration** - 2-bar, 4-bar, 16-bar - **Content** - Melodies, drums, multitrack - **Style** - Genre-specific models - **Size** - Smaller/larger variants for speed/quality tradeoff ```javascript // Melody VAE (2-bar) 'https://storage.googleapis.com/magenta-js/checkpoints/music_vae/mel_2bar_big' // Drums VAE 'https://storage.googleapis.com/magenta-js/checkpoints/music_vae/drums_2bar_small' // Multitrack VAE 'https://storage.googleapis.com/magenta-js/checkpoints/music_vae/multitrack' ``` ## MelodyRNN Recurrent neural network for melody generation. ### Configuration ```javascript // Attention RNN config const config = { type: 'attn', temperature: 1.2 }; // Generate melody conditioned on primer const primer = mm.midiToSequenceProto(primerMidi); const generated = await mm.generate( config, mm.rnnEncoder, primer, 32 // length ); ``` ### Temperature Control ```javascript // Conservative generation (higher note probability) const conservative = await model.continueSequence( primer, 32, 0.5 // low temperature ); // Creative generation (more random) const creative = await model.continueSequence( primer, 32, 1.5 // high temperature ); ``` ## DrumsRNN Drum pattern generation. ### Basic Usage ```javascript const config = { type: 'drums', temperature: 1.0 }; // Generate drums const drums = await mm.generate( config, mm.rnnEncoder, null, // no primer 16 // 16 steps (1 bar) ); ``` ### Constrain with Backing Track ```javascript const backingTrack = mm.midiToSequenceProto(backingMidi); // Generate drums that fit with backing track const drums = await mm.conditionedGenerate( config, mm.rnnEncoder, backingTrack, 16 ); ``` ## Sketch-RNN For drawing and image generation. ### Loading Model ```javascript const checkpoint = 'https://storage.googleapis.com/magenta-js/checkpoints/sketch_rnn/cat'; const model = new mm.SketchRNN(checkpoint); await model.initialize(); ``` ### Generating Sketches ```javascript // Generate random sketch const sketch = await model.sample(); // Sketch is array of points: // [{x, y, penDown}, ...] ``` ## Audio Playback ### Using Tone.js ```html ``` ```javascript const synth = new Tone.Synth({ oscillator: { type: 'square' }, envelope: { attack: 0.005, decay: 0.1, sustain: 0.3, release: 0.1 } }).toDestination(); // Play sequence const sequence = samples[0]; let currentTime = 0; sequence.notes.forEach(note => { const noteStr = Tone.Frequency(note.pitch, 'midi').toNote(); const duration = note.endTime - note.startTime; synth.triggerAttackRelease( noteStr, duration + 's', '+' + note.startTime + 'm' ); }); await Tone.start(); ``` ### MIDI Output Use Web MIDI API to send to hardware: ```javascript // Request MIDI access const midiAccess = await navigator.requestMIDIAccess(); const outputs = Array.from(midiAccess.outputs.values()); // Send notes const output = outputs[0]; const noteOnMsg = [0x90, 60, 100]; // Note on, pitch 60, velocity 100 output.send(noteOnMsg, performance.now()); ``` ## Building Interactive Applications ### Complete Music Creation App ```javascript const mm = window.mm; async function initializeApp() { // Load models const vae = new mm.MusicVAE( 'https://storage.googleapis.com/magenta-js/checkpoints/music_vae/mel_2bar_big' ); await vae.initialize(); const drums = new mm.MusicVAE( 'https://storage.googleapis.com/magenta-js/checkpoints/music_vae/drums_2bar_small' ); await drums.initialize(); // UI Elements document.getElementById('generate').addEventListener('click', async () => { // Generate melody const melodies = await vae.sample(1); const melody = melodies[0]; // Generate drums const drumPatterns = await drums.sample(1); const drumPattern = drumPatterns[0]; // Combine const combined = mm.sequences.concatenate([melody, drumPattern]); // Convert to MIDI and play const midiData = mm.sequenceProtoToMidi(combined); playSoundFromMidi(midiData); }); document.getElementById('download').addEventListener('click', () => { const midiData = mm.sequenceProtoToMidi(lastSequence); downloadMidi(midiData, 'generated.mid'); }); } function playSoundFromMidi(midiData) { // Use Web Audio API or Tone.js // ... } function downloadMidi(data, filename) { const blob = new Blob([data], {type: 'audio/midi'}); const link = document.createElement('a'); link.href = URL.createObjectURL(blob); link.download = filename; link.click(); } // Start when page loads document.addEventListener('DOMContentLoaded', initializeApp); ``` ## Utility Functions ### Sequence Manipulation ```javascript // Combine sequences const combined = mm.sequences.concatenate([seq1, seq2]); // Shift pitch const shifted = mm.sequences.shiftSequence(sequence, 5); // Up 5 semitones // Quantize const quantized = mm.sequences.quantizeSequence(sequence, 4); // Trim to length const trimmed = mm.sequences.trimSequence(sequence, 0, 16); // Clone const clone = JSON.parse(JSON.stringify(sequence)); ``` ### Validation ```javascript // Check if sequence is valid if (!mm.sequences.isAbsoluteQuantizedSequence(sequence)) { console.log('Invalid sequence'); } // Get sequence length const length = mm.sequences.getSequenceLength(sequence); ``` ## Performance Optimization ### Model Caching ```javascript // Cache models for reuse const modelCache = {}; async function getModel(checkpoint) { if (!modelCache[checkpoint]) { const model = new mm.MusicVAE(checkpoint); await model.initialize(); modelCache[checkpoint] = model; } return modelCache[checkpoint]; } ``` ### Batch Generation ```javascript // Generate multiple samples efficiently const vae = await getModel(checkpoint); const samples = await vae.sample(10); // Generate 10 at once ``` ### Worker Threads ```javascript // Use Web Workers for heavy computation const worker = new Worker('magenta-worker.js'); worker.postMessage({ command: 'generate', checkpoint: checkpointUrl, count: 5 }); worker.onmessage = (event) => { const sequences = event.data; // Process results }; ``` ## Troubleshooting ### Model Loading Issues ```javascript // Check if model loaded successfully try { await model.initialize(); console.log('Model loaded'); } catch (e) { console.error('Model loading failed:', e); } ``` ### Browser Compatibility - Requires WebGL for GPU acceleration - Falls back to CPU if WebGL unavailable - Tested on Chrome, Firefox, Safari, Edge - Mobile browsers supported (iOS Safari, Chrome Android) ### Memory Management ```javascript // Dispose of models when done model.dispose(); // Clear GPU memory tf.disposeVariables(); ``` ## Examples and Demos Official examples available at: - **GitHub Repository** - https://github.com/magenta/magenta-js - **Demo Gallery** - https://magenta.tensorflow.org/demos/web/ - **Interactive Examples** - Browser-based applications ## API Reference ### MusicVAE Methods - `initialize()` - Load model weights - `sample(numSamples)` - Generate random sequences - `encode(sequences)` - Encode to latent space - `decode(z)` - Decode from latent vectors - `interpolate(sequences, steps)` - Interpolate between sequences ### MelodyRNN Methods - `continueSequence(primer, length, temperature)` - Generate continuation - `sample(length, temperature)` - Generate from scratch ### Utility Namespaces - `mm.sequences` - Sequence manipulation - `mm.performance` - Performance modeling - `mm.chords` - Chord recognition and generation - `mm.drums` - Drum-specific utilities ## Additional Resources - **Official Documentation** - https://github.com/magenta/magenta-js - **API Docs** - https://magenta.github.io/magenta-js/ - **GitHub Issues** - Report bugs and request features - **Discussion Forum** - Community support and examples --- # Magenta Music Models **Source:** https://magenta.tensorflow.org/ Complete guide to music generation models in Google Magenta. ## MusicVAE **Variational Autoencoder for Music Generation** MusicVAE is a neural network model that learns a continuous, learnable representation of musical sequences. It enables both generation of new music and smooth interpolation between melodies. ### Architecture - Encoder-decoder variational autoencoder (VAE) - Learns a latent space of musical sequences - Trained on MIDI data - Supports various music representations ### Capabilities - **Generation** - Create new melodies from random latent vectors - **Interpolation** - Smooth morphing between two melodies - **Reconstruction** - Reconstruct input sequences - **Multi-track** - Generate multiple instrument parts simultaneously ### Models Available - **Melody** - Single melody line generation - **Drums** - Drum pattern generation - **Multitrack** - Multiple simultaneous instruments - **Bass + Drums** - Drum and bass accompaniment ### Example Usage (Python) ```python from magenta.models import music_vae from magenta.music import note_seq # Initialize model vae = music_vae.MusicVAE() vae.load_checkpoint(checkpoint_path) # Generate from random vector z = np.random.normal(size=16) # Latent vector generated_sequence = vae.decode(z) # Interpolate between two sequences seq1 = ... # First melody seq2 = ... # Second melody encoded1 = vae.encode(seq1) encoded2 = vae.encode(seq2) # Create smooth interpolation num_steps = 10 interpolated = [] for i in range(num_steps): t = i / (num_steps - 1) z = encoded1 * (1 - t) + encoded2 * t interpolated.append(vae.decode(z)) ``` ### Checkpoints Pre-trained checkpoints available at: `goo.gl/magenta/musicvae-colab` Includes configurations for: - 2-bar melodies - 16-bar melodies - Drum patterns - Multitrack music ## MelodyRNN **Recurrent Neural Network for Melody Generation** MelodyRNN uses LSTM recurrent neural networks to generate melodies conditioned on musical history. ### Configurations #### Basic RNN - Simple LSTM without attention - Generates melodies based on previous notes - Fastest inference #### Attention RNN - LSTM with attention mechanism - Attends to full sequence history - Better long-range dependencies - Better melodic coherence #### LookBack RNN - Attention mechanism over previous N steps - Efficient attention with fixed lookback window - Balance between Basic and Attention RNN ### Temperature Parameter Controls randomness in generation: - **Low (0.5-0.7)** - More conservative, safer choices - **Medium (1.0)** - Natural randomness - **High (1.5+)** - More creative, chaotic output ### Training ```bash # Create dataset from MIDI files melody_rnn_create_dataset \ --config=attention_rnn \ --input_dir=/path/to/midi/files \ --output_file=output.tfrecord \ --eval_split=0.1 # Train model melody_rnn_train \ --config=attention_rnn \ --run_dir=/path/to/run \ --sequence_example_file=output.tfrecord \ --hparams="batch_size=32" # Generate melodies melody_rnn_generate \ --config=attention_rnn \ --checkpoint=/path/to/checkpoint \ --output_dir=/tmp/melody_rnn_output \ --temperature=1.0 \ --num_outputs=10 ``` ## DrumsRNN **Recurrent Neural Network for Drum Pattern Generation** Generates drum patterns given an optional backing track. Models polyphonic drum interactions. ### Features - Multi-instrument drum support - Learns drum pattern timing and dynamics - Can condition on tempo and style - Pre-trained models for various genres ### Example Patterns ```python from magenta.models import drums_rnn from magenta.music import note_seq # Generate drum pattern drums_model = drums_rnn.DrumsRNN() drums_model.load_checkpoint(checkpoint) # With backing track backing_track = note_seq.NoteSequence(...) drums = drums_model.generate( backing_track=backing_track, temperature=1.0, length=16 # 16 quarter notes ) ``` ## Performance RNN **Piano Performance Modeling** Models piano performance characteristics including: - Note timing (velocity, when notes are struck) - Sustain pedal usage - Expressive variations ### Capabilities - Generate expressive piano performances - Model rubato and dynamics - Learn performance style from examples - Condition generation on melodies ## Music Transformer **Transformer-Based Long-Form Music Generation** Uses transformer architecture for generating longer, more structured musical pieces. ### Advantages Over RNN - Better long-range dependencies - More coherent large-scale structure - Attention mechanisms capture harmonic relationships - Can generate pieces 4-5 minutes long ### Features - Self-attention over full sequence - Positional encodings for temporal relationships - Generates multi-track compositions - Better maintains harmonic structure ## Polyphony RNN **Polyphonic Music Generation** Generates multi-voice, multi-instrument compositions with proper voice leading and harmony. ### Architecture - LSTM with polyphonic encoding - Handles up to 16 simultaneous notes - Learns harmonic relationships - Models voice independence ## Improv RNN **Interactive Improvisation Model** Generates musical continuations given initial musical ideas. Useful for: - AI-assisted composition - Interactive music creation - Real-time musical response ### Interactive Generation ```python from magenta.models import improv_rnn model = improv_rnn.ImprovRNN() model.load_checkpoint(checkpoint) # Prime with initial melody primer = note_seq.NoteSequence(...) # Generate continuation continuation = model.generate( primer=primer, temperature=1.0, length=32 ) # Combine complete = note_seq.combine_sequences([primer, continuation]) ``` ## NSynth (Neural Synth) **WaveNet-Based Audio Synthesis** Neural network approach to synthesizing audio from learned instrument timbres. ### Features - Learn timbral characteristics from audio samples - Generate novel sounds by interpolating in timbre space - Supports pitch transposition - Real-time synthesis capability ### Architecture - WaveNet vocoder - Learned embeddings for instruments/timbres - Autoregressive audio generation - Trained on diverse instrument samples ## GANSynth **Generative Adversarial Network for Audio Synthesis** GAN-based approach to audio synthesis with improved phase coherence compared to WaveNet. ### Advantages - Better phase coherence (fewer artifacts) - Cleaner, more natural sounding audio - Faster generation than WaveNet - Better perceptual quality ### Architecture - Generator network for audio synthesis - Discriminator for quality assessment - Progressive training for stability - Learned timbre/instrument embeddings ## Onsets and Frames Transcription **Automatic Music Transcription** Converts audio recordings to MIDI notation using deep learning. ### Features - Note onset detection - Frame-level note probability estimation - Automatic velocity estimation - Multi-instrument capable ### Usage ```bash # Transcribe audio to MIDI onsets_frames_transcription_transcribe \ --checkpoint=/path/to/checkpoint \ --input_audio=/path/to/audio.wav \ --output_midi=/tmp/output.mid ``` ## Model Training Guidelines ### Data Preparation 1. **MIDI Files** - Collect high-quality MIDI in your domain - Remove overly long or short sequences - Clean up timing and velocity data - Split into train/validation/test 2. **Audio Files** - Resample to consistent rate (16kHz typical) - Normalize volume levels - Segment into training windows - Align with MIDI when needed ### Hyperparameters Common settings: - **Batch size** - 32-128 depending on GPU memory - **Learning rate** - 0.001-0.01 (Adam optimizer) - **Embedding size** - 128-512 dimensions - **RNN units** - 256-1024 for LSTM - **Dropout** - 0.2-0.5 for regularization ### Evaluation Metrics - **Perplexity** - How surprising the model finds test data - **Accuracy** - For classification tasks - **BLEU** - Sequence similarity to ground truth - **Human listening tests** - Essential for music quality ## Pre-trained Checkpoints All models provide pre-trained checkpoints: ```python import os checkpoint_dir = os.path.join( os.path.expanduser('~'), 'magenta-checkpoints' ) # Checkpoints automatically downloaded on first use ``` ## Performance Considerations ### Inference Speed Approximate generation times per 16 notes (4 seconds of music): - **RNN models** - 50-200ms CPU, 10-50ms GPU - **Transformer** - 100-500ms CPU, 20-100ms GPU - **Audio synthesis** - 1-5 seconds per 4-second clip ### Memory Requirements Model sizes: - **RNN models** - 50-200MB - **Transformer** - 200-500MB - **Audio synthesis** - 300-1000MB ### Optimization - Use GPU for real-time generation - Batch generation for multiple samples - Cache model checkpoints locally - Use smaller models for resource-constrained environments ## Community Models Community members have trained models on: - Different musical genres - Cultural music traditions - Specific instruments - Custom datasets Check the Magenta discussion forum and GitHub issues for community contributions. --- # Magenta Real-Time Music Generation **Source:** https://magenta.tensorflow.org/magenta-realtime and https://magenta.tensorflow.org/lyria-realtime Real-time music generation models and APIs for live music creation and interactive applications. ## Overview Real-time music generation opens unique opportunities for: - Live music performance and improvisation - Interactive composition tools - Dynamic soundtrack generation - AI-assisted music creation in DAWs - Responsive musical applications ## Lyria RealTime **State-of-the-art live music generation via API** Lyria RealTime is the production-ready live music generation model powering: - Music FX DJ - Google AI Studio real-time music API - Commercial DAW integrations - Interactive music applications ### Features - **Low Latency** - Sub-second music generation - **Text Control** - Generate music from text prompts - **Continuous Streaming** - Seamless music generation - **Quality** - State-of-the-art audio quality - **Flexibility** - Works with various music styles and genres ### Access Lyria RealTime is available through: 1. **Google AI Studio** - https://aistudio.google.com/ 2. **Gemini API** - https://ai.google.dev/ 3. **Commercial integrations** - DAW plugins and applications ### API Usage (Gemini API) ```python import anthropic client = anthropic.Anthropic() # Uses GOOGLE_API_KEY message = client.messages.create( model="gemini-2.0-flash", max_tokens=1024, messages=[ { "role": "user", "content": "Generate upbeat electronic music with synth leads" } ] ) ``` ### Web Integration ```javascript // Using Google AI Studio SDK async function generateMusic(prompt) { const response = await fetch('https://aistudio.google.com/api/generate', { method: 'POST', headers: { 'Authorization': `Bearer ${API_KEY}`, 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: prompt, style: 'electronic' }) }); const audioBlob = await response.blob(); return URL.createObjectURL(audioBlob); } // Usage const musicUrl = await generateMusic('upbeat ambient music'); const audio = new Audio(musicUrl); audio.play(); ``` ## Magenta RT **Open-Weights Live Music Generation Model** Magenta RT is the open-source alternative to Lyria RealTime, providing direct access to code and models for researchers and developers. ### Model Details - **Architecture** - 800 million parameter autoregressive transformer - **Training Data** - ~190,000 hours of stock music from multiple sources - **Mostly Instrumental** - Focused on instrumental music generation - **Open Weights** - Available under permissive licenses - **Local Execution** - Can run on consumer hardware ### Key Advantages 1. **Open Source** - Full code and weights available 2. **Customizable** - Fine-tune on custom datasets 3. **Local Control** - No API dependencies 4. **Research Friendly** - Accessible for academic work 5. **Commercial Use** - Permissive licensing ### Model Availability - **GitHub** - https://github.com/magenta/magenta-realtime - **Hugging Face** - Community models and fine-tunes - **Google Cloud Storage** - Official weights ### Installation ```bash # Clone repository git clone https://github.com/magenta/magenta-realtime.git cd magenta-realtime # Install dependencies pip install -r requirements.txt # Download model weights python scripts/download_weights.py ``` ### Basic Usage (Python) ```python import torch from magenta_rt import MagentaRT # Load model model = MagentaRT.from_pretrained('magenta-rt-base') model = model.to('cuda') # Use GPU # Generate music with torch.no_grad(): # Generate 30 seconds of music audio = model.generate( duration=30, temperature=1.0 ) # Save to file import torchaudio torchaudio.save('output.wav', audio, sample_rate=16000) ``` ### Inference in Colab ```python # Free-tier Colab TPU inference from magenta_rt import MagentaRT model = MagentaRT.from_pretrained('magenta-rt-base') model = model.to('tpu') # Generate audio = model.generate(duration=30) ``` ### Fine-tuning ```python # Fine-tune on custom music dataset from magenta_rt import MagentaRT, train model = MagentaRT.from_pretrained('magenta-rt-base') # Prepare dataset train_dataset = CustomMusicDataset('/path/to/music') train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=8, shuffle=True ) # Train model = train.fine_tune( model, train_loader, epochs=5, learning_rate=1e-5 ) # Save model.save_pretrained('./fine-tuned-model') ``` ### Known Limitations - Optimized for instrumental music - Best quality with 30+ second generation - Variable quality on out-of-distribution prompts - Requires GPU for reasonable latency - ~30 seconds of audio per 5-10 minutes of generation (quality/speed tradeoff) ## Real-Time DAW Integration ### DDSP-VST Neural audio synthesis VST plugin using Differentiable Digital Signal Processing. #### Features - Real-time parameter control - Learned harmonic and noise synthesis - Modular DSP components - Support for major DAWs #### Installation 1. Download DDSP-VST from Magenta website 2. Copy plugin file to DAW plugins folder 3. Scan for new plugins in DAW 4. Use like any other instrument #### Usage in DAW ``` 1. Create new audio/MIDI track 2. Insert DDSP-VST plugin 3. Play MIDI notes 4. Adjust synthesis controls: - Harmonic controls - Noise controls - Effects (reverb, chorus) 5. Record output ``` ### Lyria RealTime VST The Infinite Crate - DAW plugin for Lyria RealTime. #### Features - Text-prompt-based music generation - Mix multiple prompts - Adjustable generation parameters - Live audio input integration - Real-time playback in DAW #### Workflow ``` 1. Type musical description "upbeat electronic with synth leads" 2. Adjust parameters: - Temperature (randomness) - Duration - Style 3. Generate music in real-time 4. Mix prompts for blended results 5. Feed audio to DAW input - Use as backing track - Sample for composition - Process with effects ``` ## Interactive Applications ### Space DJ **Interactive Music Exploration** Web application using Lyria RealTime API for music generation. ```javascript // Pseudo-code for Space DJ style app class SpaceDJ { constructor(apiKey) { this.apiKey = apiKey; this.currentGenres = ['electronic', 'ambient']; } async generateMusic() { const prompt = this.currentGenres.join(' '); const response = await fetch('/api/generate-music', { method: 'POST', body: JSON.stringify({ prompt }) }); return response.blob(); } async exploreMusicSpace() { // Update genres based on user interaction this.currentGenres = this.getSelectedGenres(); // Generate music for new location const audio = await this.generateMusic(); this.playAudio(audio); } } const dj = new SpaceDJ(API_KEY); document.addEventListener('mousemove', () => dj.exploreMusicSpace()); ``` ### Lyria Camera **Music Generation from Visual Input** Application combining image understanding with Lyria RealTime. ```python import anthropic import base64 from pathlib import Path def generate_music_from_image(image_path: str, api_key: str) -> bytes: """Generate music based on camera input.""" with open(image_path, 'rb') as f: image_data = base64.b64encode(f.read()).decode('utf-8') # Analyze image image_prompt = """Analyze this image and describe the mood, colors, atmosphere, and emotions it evokes. Be descriptive.""" client = anthropic.Anthropic(api_key=api_key) # Get image analysis analysis = client.messages.create( model="gemini-2.0-flash", max_tokens=500, messages=[{ "role": "user", "content": [ { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": image_data } }, { "type": "text", "text": image_prompt } ] }] ) # Generate music from analysis music_response = client.messages.create( model="gemini-2.0-flash", max_tokens=200, messages=[{ "role": "user", "content": f"""Based on this image description, generate a music prompt that captures the essence. Description: {analysis.content[0].text}""" }] ) music_prompt = music_response.content[0].text # Generate music using music generation API # ... (implementation depends on available API) return music_data ``` ## Performance Optimization ### Batch Generation ```python # Generate multiple tracks efficiently model = MagentaRT.from_pretrained('magenta-rt-base') batch_size = 4 with torch.no_grad(): # Process multiple requests at once audio_batch = model.generate( duration=30, num_samples=batch_size, temperature=1.0 ) # Split batch results for i, audio in enumerate(audio_batch): torchaudio.save(f'output_{i}.wav', audio, 16000) ``` ### Streaming Generation ```python # Stream audio chunks for real-time playback from magenta_rt import MagentaRT model = MagentaRT.from_pretrained('magenta-rt-base') chunk_duration = 5 # 5 second chunks for i, chunk in model.generate_streaming( total_duration=30, chunk_duration=chunk_duration ): # Process each chunk immediately yield chunk # Can play immediately without waiting for full generation ``` ### GPU Optimization ```python import torch # Enable mixed precision for faster inference from torch.cuda.amp import autocast model = model.half() # Use float16 with torch.no_grad(), autocast(): audio = model.generate(duration=30) ``` ## Deployment ### Cloud Deployment (Google Cloud) ```bash # Deploy Magenta RT on Cloud Run gcloud run deploy magenta-rt \ --source . \ --platform managed \ --region us-central1 \ --memory 8Gi \ --gpu 1 \ --allow-unauthenticated ``` ### Docker Deployment ```dockerfile FROM pytorch/pytorch:latest WORKDIR /app # Install Magenta RT RUN pip install magenta-rt torch torchaudio # Copy model weights COPY model_weights /app/weights # Copy inference server COPY server.py . CMD ["python", "server.py"] ``` ### Local Deployment (Ollama-style) ```bash # Run Magenta RT model locally magenta-rt serve \ --model magenta-rt-base \ --port 8000 \ --device cuda ``` ## API Reference ### MagentaRT Class #### Methods - `from_pretrained(model_name)` - Load pre-trained model - `generate(duration, temperature, **kwargs)` - Generate audio - `generate_streaming(total_duration, chunk_duration)` - Stream generation - `to(device)` - Move to device (cuda, cpu, tpu) - `save_pretrained(path)` - Save fine-tuned model #### Parameters - `duration` (float) - Length of audio in seconds - `temperature` (float) - 0.5-2.0, controls randomness - `seed` (int) - For reproducible generation - `sample_rate` (int) - Audio sample rate (default: 16000) ## Best Practices 1. **Cache Models** - Load once, reuse for multiple generations 2. **Batch Processing** - Generate multiple samples simultaneously 3. **Hardware** - Use GPU for latency-sensitive applications 4. **Error Handling** - Handle network/generation failures gracefully 5. **Rate Limiting** - Implement backoff for API calls 6. **Memory Management** - Dispose of models when done 7. **Testing** - Validate generation quality with users ## Troubleshooting ### High Latency - Use GPU instead of CPU - Reduce generation duration - Use smaller batch sizes - Enable quantization for compression ### Out of Memory - Reduce batch size - Use smaller model variant - Enable gradient checkpointing - Use mixed precision training ### Poor Audio Quality - Increase model capacity - Fine-tune on domain-specific data - Adjust temperature parameter - Use ensemble of models ## Additional Resources - **GitHub** - https://github.com/magenta/magenta-realtime - **Paper** - https://arxiv.org/abs/2404.XXXXX (check website) - **Demos** - https://magenta.tensorflow.org/ - **Community Discussions** - GitHub Issues and Google Groups