Alexandria Audiobook

1Configuration

LLM Settings (Script Generation)

Base URL

URL for your OpenAI-compatible server (LM Studio, Ollama, etc.)

API Key

Model Name

TTS Settings (Voice Generation)

TTS Mode

Local loads model in-process; External connects to a Gradio TTS server

TTS Server URL

URL for the Qwen3-TTS Gradio server

TTS Language

Language for audio synthesis only. To generate scripts in another language, edit the prompts under Prompt Settings (Advanced).

Parallel Workers

Concurrent TTS requests

Batch Seed

Seed for reproducibility

Speaker Change Pause (ms)

Silence between different speakers on merge

Same Speaker Pause (ms)

Silence when same speaker continues on merge

Prompt Settings (Advanced)

Generation Settings

Chunk Size (characters)

Size of text chunks sent to LLM (characters). Smaller = more precise, larger = more context.

Max Tokens (response)

Maximum tokens for LLM response. Increase if output is being truncated.

LLM Sampling Parameters

Temperature

Randomness (0 = deterministic)

Top P

Nucleus sampling threshold

Top K

Top-K token filtering (0 = disabled)

Min P

Minimum probability cutoff (0 = disabled)

Presence Penalty

Penalize repeated topics

Banned Tokens

Comma-separated tokens the LLM is forbidden from generating. Use to disable thinking mode (e.g. <think>).

Merge consecutive narrator lines

During script review, combine adjacent NARRATOR entries that share the same instruct into a single longer entry. Disable for better per-line voice direction control.

Prompt Customization

System Prompt

Instructions for the LLM on how to convert text to script. Defines output format, rules, and style guidelines.

User Prompt Template

Template for each chunk. Use {context} for chunk context and {chunk} for the text content.

Review Prompt Customization

Review System Prompt

Instructions for the LLM during the script review pass. Defines what errors to fix and how.

Review User Prompt Template

Template for each review batch. Use {context} for batch context and {batch} for the script entries.

2Generate Script

Select Book/Novel (Text File)

Sends the book to your LLM to split it into annotated chunks with speaker labels and voice directions. Runs a second LLM pass to fix speaker misattributions or formatting issues. Only needed if the generated script has problems.

Saved Scripts

No saved scripts yet.

Generation Logs

3Voice Configuration

Voices are detected from the generated script. Changes are saved automatically.

Voice Parsing Logs

4Voice Designer

Design new voices from text descriptions. Generated voices can be saved and used as clone sources in the Voices tab.

Voice Description

Describe the voice characteristics you want

Sample Text

Text to synthesize with the designed voice

Saved Voices

No designed voices yet. Generate and save a preview above.

5LoRA Training

Train LoRA adapters on the Base model to create custom voice identities. Upload a dataset (ZIP with WAV files + metadata.jsonl), configure training, and run.

Dataset

ZIP containing WAV files (24kHz mono) and metadata.jsonl with audio_filepath and text fields

No datasets uploaded yet.

Open the Dataset Builder to create training datasets with per-line preview and generation.

Training Configuration

Adapter Name

Dataset

Epochs

Learning Rate

Batch Size

LoRA Rank

LoRA Alpha

Grad Accum Steps

How Settings Affect LoRA Voice Quality

Trained Adapters

No trained adapters yet.

6Dataset Builder

Build LoRA training datasets with per-sample generation, preview, and re-generation.

Root Voice Description

Global Seed

Empty = random. Same seed = same voice.

Import JSON

#	Emotion / Style	Text	Seed	Status	Audio / Actions

Reference Sample:

The reference sample is used as the speaker embedding (ref.wav) during LoRA training. Pick a clear, representative line.

4Audio Editor

	\|		\|			\|
Preview generated audio		Generate audio for all unrendered chunks		Re-render everything from scratch			Combine all chunks into final audiobook

	Speaker	Text	Instruct	Status	Audio / Actions

5Final Result

Your Audiobook

Download MP3

M4B Export Options

Title

Author

Narrator

Year

Description

Cover Image

Per-chunk chapters

Generation Logs

1Configuration

LLM Settings (Script Generation)

TTS Settings (Voice Generation)

Prompt Settings (Advanced)

Generation Settings

LLM Sampling Parameters

Prompt Customization

Review Prompt Customization

2Generate Script

Saved Scripts

3Voice Configuration

4Voice Designer

Saved Voices

5LoRA Training

Dataset

Training Configuration

Training Progress

Trained Adapters

Test Voice

6Dataset Builder

4Audio Editor

5Final Result

Your Audiobook