🎵VIETNAMESE RVC🎵
Simple high-quality and high-performance voice conversion and training.
Music Separation
A simple music separation system can separate into 4 parts: Instruments, Vocals, Main vocals, Backup vocals
Separated output
Convert Audio
Convert audio using a trained voice model
Speaker ID for the multi-voice model
Enter the path to the audio file
Extracting pitch using the ONNX model can help improve speed
Unlock all pitch extraction methods
Combination of two or more different types of extracts
Blend multiple embedding layers together to achieve better audio quality.
Converted audio
Convert Audio With Voice Activity Detector
Use Voice Activity Detection (VAD) combined with the SpeechBrain model to automatically identify speakers in an audio file. Then, segment and split the audio into smaller individual clips, after which voice conversion is applied using a voice model.
Speaker ID for the multi-voice model
Enter the path to the audio file
Speaker ID for the multi-voice model
Extracting pitch using the ONNX model can help improve speed
Unlock all pitch extraction methods
Combination of two or more different types of extracts
Blend multiple embedding layers together to achieve better audio quality.
Audio input, output
Convert Text to Speech
Convert text to speech and read aloud using the trained voice model
Speaker ID for the multi-voice model
Extracting pitch using the ONNX model can help improve speed
Unlock all pitch extraction methods
Combination of two or more different types of extracts
Blend multiple embedding layers together to achieve better audio quality.
Unconverted and converted audio
Add Additional Audio Effects
Add effects to audio
Enter the path to the audio file
Enter the path to the audio file
Create a continuous echo effect when this mode is enabled
Audio output
Weird Effects for Audio
Apply quirky effects to your audio to make it weird and weird.
Enter the path to the audio file
Realtime Conversion
Realtime voice conversion
Realtime not started
Input audio device, Recommended WASAPI or ASIO for low latency
Output audio device, output audio device e.g. Speaker, headphone,...
Second output audio device for playback audio
Speaker ID for the multi-voice model
Extracting pitch using the ONNX model can help improve speed
Unlock all pitch extraction methods
Blend multiple embedding layers together to achieve better audio quality.
Create a continuous echo effect when this mode is enabled
Train Model
Train and build a voice model with a set of voice data
The name assigned to the output reference set used.
Combination of two or more different types of extracts
Blend multiple embedding layers together to achieve better audio quality.
Store the model in GPU cache memory
Training model with RMS energy
Check for overtraining during model training
Custom dataset folder for training data
Save only the latest D and G models
Save all models after each epoch
Clean up and retrain from scratch
Do not use pre-trained models
Customize pre-training settings
Compares the Mel spectrograms of real and generated audio at multiple scales. Helps the model learn timbral details, brightness, and frequency structure more effectively, thereby improving output speech quality and naturalness.
Enabling Cosine Annealing for learning rate decay can help produce clearer and more natural pronunciation.
When enabled, highly deterministic algorithms are used, ensuring that each run of the same input data will yield the same results.
When disabled, more optimal algorithms may be selected but may not be fully deterministic, resulting in different training results between runs.
When enabled, it will test and select the most optimized algorithm for the specific hardware and size. This can help speed up training.
When disabled, it will not perform this algorithm optimization, which can reduce speed but ensures that each run uses the same algorithm, which is useful if you want to reproduce exactly.
Create Dataset training from YouTube
Process and create training datasets using YouTube links
Create Training Reference
Create reference for quality assurance when training models via TensorBoard.
Enter the path to the audio file
Combination of two or more different types of extracts
Blend multiple embedding layers together to achieve better audio quality.
Download Model
Download voice models, pre-trained models
Choose a pre-trained model to download
Model samplerate
Fushion Two Models
Combine two voice models into a single model
Read Model Information
Retrieve recorded information within the model
Converting PYTORCH Model to ONNX Model
Convert RVC model from pytorch to onnx to optimize audio conversion
Convert SVC model to a readable project format
Convert SVC models trained from Sovits SVC 4.1 into a format readable by the project. Currently, only models with the original configuration are supported.
Pitch Extraction
F0 pitch extraction is intended for use in audio conversion inference
Create SRT File From Audio File
Use Whisper to convert audio file to text and create srt file
Set input languages to avoid the model confirming the wrong language
Enter the path to the audio file
Additional Settings
Customize additional features of the project
The display language in the project (When making changes, a system restart is required for them to take effect)
Theme type displayed in the interface (When making changes, a system restart is required for them to take effect)
Click here if you want to be Rick Roll :) ---> RickRoll
Please do not use the project for any unethical, illegal, or harmful purposes to individuals or organizations...
In cases where users do not comply with the terms or violate them, I will not be responsible for any claims, damages, or liabilities, whether in contract, negligence, or other causes arising from, outside of, or related to the software, its use, or other transactions associated with it.