Format name: Letter‑Aligned Lyric Timing Specification
Short name: LLTS
File extension: .llts
LLTS defines a copyright‑safe interchange format for precise lyric timing, including sub‑word (letter‑range) alignment, without containing any lyric text. The format is intended for karaoke engines, DAWs, captioning tools, and research systems that require deterministic alignment while relying on user‑supplied lyrics obtained separately.
LLTS files contain only:
- Timing metadata
- Numeric character ranges
- A cryptographic hash identifying the intended lyric text
No copyrighted expression is included.
- No lyrical content: No letters, words, or substrings are stored.
- Deterministic alignment: Character ranges reference an external lyric string.
- Version safety: A cryptographic hash ensures the correct lyric variant is used.
- Irreversibility: Data cannot be used to reconstruct lyrics.
- Implementation neutrality: Usable across platforms and languages.
An LLTS file is valid only when paired with an external lyric string supplied by the user or host application.
The host application MUST:
- Normalize the external lyric string using the algorithm in §6
- Compute its hash using the parameters in §7
- Compare the result to the
lyrics_hashfield
If the hashes do not match, the LLTS file MUST be rejected or treated as unaligned.
LLTS is a UTF‑8 encoded text file using JSON syntax.
Top‑level object:
llts_versiontracklyrics_hashhash_algorithmnormalizationtiming
String. Semantic version of the specification.
Example:
"llts_version": "1.0"
Object containing optional identification metadata.
Allowed fields (all OPTIONAL):
title(string)artist(string)duration_ms(integer)
These fields are informational and not used for validation.
String. Hex‑encoded cryptographic hash of the normalized external lyrics.
Example:
"lyrics_hash": "9f2c8b7e3a0d…"
String identifier of the hash function.
RECOMMENDED:
SHA-256
Other algorithms MAY be supported if explicitly declared.
Object declaring the normalization rules applied prior to hashing.
This object describes the algorithm but does not include the normalized text.
Example:
"normalization": {
"unicode": "NFC",
"case": "lower",
"whitespace": "collapse",
"line_endings": "lf",
"punctuation": "remove"
}
The following steps MUST be applied in order to the external lyric string prior to hashing:
- Convert text to Unicode NFC form
- Convert all letters to lowercase
- Normalize all line endings (CR, CRLF, LF) to LF (U+000A)
- Collapse multiple consecutive line breaks into a single LF
- Replace any remaining sequence of whitespace characters (including tabs) with a single ASCII space (U+0020)
- Remove punctuation characters (Unicode General Category P*)
- Trim leading and trailing whitespace
The resulting normalized string:
- Has deterministic handling of line returns
- Preserves line boundaries via single LF characters
- Is used only as input to the hash function
The normalized string MUST NOT be stored or transmitted.
timing is an array of timing objects. Each object represents a time‑aligned character range.
Required fields:
start_charend_charstart_msend_ms
Optional fields:
line_idvoicelayer
Example:
{
"start_char": 134,
"end_char": 138,
"start_ms": 12340,
"end_ms": 12520,
"line_id": 5,
"voice": "lead"
}
- Indexing is zero‑based
- Indices refer to the normalized lyric string
- The normalized string includes LF characters that mark line boundaries
- Ranges are
[start_char, end_char)(end exclusive) - Ranges MUST NOT overlap unless explicitly supported by the host application
- Times are expressed in milliseconds relative to audio start
end_msMUST be greater thanstart_ms- Timing entries MAY be contiguous or gapped
LLTS defines lyric lines explicitly using a lines array.
Each line object defines a continuous character span in the normalized lyric string.
Required fields:
line_id(integer)start_charend_char
Optional fields:
voicedescription
Example:
{
"line_id": 5,
"start_char": 120,
"end_char": 160,
"voice": "duet_a"
}
Line spans MUST align with LF boundaries in the normalized lyric string.
The optional voice field classifies who performs the lyric segment.
RECOMMENDED enumerations:
leadduet_aduet_bgroupbackingspoken
Custom values MAY be used if documented by the application.
The optional layer field MAY be used to express overlapping voices or harmonies.
An LLTS file is valid if:
- The file parses as valid JSON
llts_versionis supportedlyrics_hashmatches the hash of the normalized external lyrics- All line spans fall within the lyric string length
- All timing ranges fall within their referenced line spans
- Timing values are monotonically non‑decreasing per
layer
- LLTS files contain no copyrighted text
- Hashes are one‑way and non‑reversible
- Character ranges are meaningless without the external lyric string
- The format does not enable lyric reconstruction
Implementations SHOULD require users to supply lyrics obtained lawfully.
Future versions MAY add optional fields, including:
- Phoneme class labels (non‑textual)
- Confidence scores
- Multiple language tracks
No extension may include lyric text unless explicitly licensed.
The following example demonstrates all major LLTS features:
- Explicit line boundaries
- Deterministic normalization assumptions
- Multiple voices (lead, duet, backing)
- Overlapping layers
- Per-letter timing
This example assumes the user supplies the correct external lyrics whose normalized form hashes to the value shown.
{
"llts_version": "1.0",
"track": {
"title": "Example Song",
"artist": "Example Artist",
"duration_ms": 215000
},
"lyrics_hash": "4d7c2a9e6f3b8c1a0e9f4d2c7b1a6e5f8c9d0a1b2c3d4e5f6a7b8c9d0e1f",
"hash_algorithm": "SHA-256",
"normalization": {
"unicode": "NFC",
"case": "lower",
"whitespace": "collapse",
"line_endings": "lf",
"punctuation": "remove"
},
"lines": [
{
"line_id": 0,
"start_char": 0,
"end_char": 22,
"voice": "lead",
"description": "Verse 1 – lead vocal"
},
{
"line_id": 1,
"start_char": 23,
"end_char": 47,
"voice": "duet_a",
"description": "Chorus – first singer"
},
{
"line_id": 2,
"start_char": 23,
"end_char": 47,
"voice": "duet_b",
"description": "Chorus – second singer"
},
{
"line_id": 3,
"start_char": 48,
"end_char": 72,
"voice": "backing",
"description": "Backing vocals"
}
],
"timing": [
{
"start_char": 0,
"end_char": 4,
"start_ms": 1200,
"end_ms": 1450,
"line_id": 0,
"voice": "lead",
"layer": 0
},
{
"start_char": 4,
"end_char": 9,
"start_ms": 1450,
"end_ms": 1800,
"line_id": 0,
"voice": "lead",
"layer": 0
},
{
"start_char": 23,
"end_char": 28,
"start_ms": 32000,
"end_ms": 33500,
"line_id": 1,
"voice": "duet_a",
"layer": 0
},
{
"start_char": 23,
"end_char": 28,
"start_ms": 32000,
"end_ms": 33500,
"line_id": 2,
"voice": "duet_b",
"layer": 1
},
{
"start_char": 48,
"end_char": 52,
"start_ms": 60000,
"end_ms": 62000,
"line_id": 3,
"voice": "backing",
"layer": 0
},
{
"start_char": 52,
"end_char": 58,
"start_ms": 62000,
"end_ms": 65000,
"line_id": 3,
"voice": "backing",
"layer": 0
}
]
}
This specification is licensed under the Apache License, Version 2.0 (the "License").
You may not use this specification except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, this specification is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright © The LLTS Contributors.