n-studio/LLTS_SPECS.md

## LLTS_SPECS.md

      
    Raw
  

              LLTS_SPECS.md
            
          
    Letter‑Aligned Lyric Timing Specification (LLTS)

Format name: Letter‑Aligned Lyric Timing Specification
Short name: LLTS
File extension: .llts
1. Purpose and Scope

LLTS defines a copyright‑safe interchange format for precise lyric timing, including sub‑word (letter‑range) alignment, without containing any lyric text. The format is intended for karaoke engines, DAWs, captioning tools, and research systems that require deterministic alignment while relying on user‑supplied lyrics obtained separately.
LLTS files contain only:

Timing metadata
Numeric character ranges
A cryptographic hash identifying the intended lyric text

No copyrighted expression is included.

2. Design Principles


No lyrical content: No letters, words, or substrings are stored.
Deterministic alignment: Character ranges reference an external lyric string.
Version safety: A cryptographic hash ensures the correct lyric variant is used.
Irreversibility: Data cannot be used to reconstruct lyrics.
Implementation neutrality: Usable across platforms and languages.


3. External Lyric Requirement

An LLTS file is valid only when paired with an external lyric string supplied by the user or host application.
The host application MUST:

Normalize the external lyric string using the algorithm in §6
Compute its hash using the parameters in §7
Compare the result to the lyrics_hash field

If the hashes do not match, the LLTS file MUST be rejected or treated as unaligned.

4. Data Model Overview

LLTS is a UTF‑8 encoded text file using JSON syntax.
Top‑level object:

llts_version
track
lyrics_hash
hash_algorithm
normalization
timing


5. Header Fields

5.1 llts_version

String. Semantic version of the specification.
Example:
"llts_version": "1.0"


5.2 track

Object containing optional identification metadata.
Allowed fields (all OPTIONAL):

title (string)
artist (string)
duration_ms (integer)

These fields are informational and not used for validation.

5.3 lyrics_hash

String. Hex‑encoded cryptographic hash of the normalized external lyrics.
Example:
"lyrics_hash": "9f2c8b7e3a0d…"


5.4 hash_algorithm

String identifier of the hash function.
RECOMMENDED:

SHA-256

Other algorithms MAY be supported if explicitly declared.

5.5 normalization

Object declaring the normalization rules applied prior to hashing.
This object describes the algorithm but does not include the normalized text.
Example:
"normalization": {
  "unicode": "NFC",
  "case": "lower",
  "whitespace": "collapse",
  "line_endings": "lf",
  "punctuation": "remove"
}


6. Normalization Algorithm (Normative)

The following steps MUST be applied in order to the external lyric string prior to hashing:

Convert text to Unicode NFC form
Convert all letters to lowercase
Normalize all line endings (CR, CRLF, LF) to LF (U+000A)
Collapse multiple consecutive line breaks into a single LF
Replace any remaining sequence of whitespace characters (including tabs) with a single ASCII space (U+0020)
Remove punctuation characters (Unicode General Category P*)
Trim leading and trailing whitespace

The resulting normalized string:

Has deterministic handling of line returns
Preserves line boundaries via single LF characters
Is used only as input to the hash function

The normalized string MUST NOT be stored or transmitted.

7. Timing Entries

7.1 Structure

timing is an array of timing objects. Each object represents a time‑aligned character range.
Required fields:

start_char
end_char
start_ms
end_ms

Optional fields:

line_id
voice
layer

Example:
{
  "start_char": 134,
  "end_char": 138,
  "start_ms": 12340,
  "end_ms": 12520,
  "line_id": 5,
  "voice": "lead"
}


7.2 Character Indexing Rules


Indexing is zero‑based
Indices refer to the normalized lyric string
The normalized string includes LF characters that mark line boundaries
Ranges are [start_char, end_char) (end exclusive)
Ranges MUST NOT overlap unless explicitly supported by the host application


7.3 Timing Rules


Times are expressed in milliseconds relative to audio start
end_ms MUST be greater than start_ms
Timing entries MAY be contiguous or gapped


8. Line Structure

8.1 Line Table

LLTS defines lyric lines explicitly using a lines array.
Each line object defines a continuous character span in the normalized lyric string.
Required fields:

line_id (integer)
start_char
end_char

Optional fields:

voice
description

Example:
{
  "line_id": 5,
  "start_char": 120,
  "end_char": 160,
  "voice": "duet_a"
}

Line spans MUST align with LF boundaries in the normalized lyric string.

9. Voice and Role Annotation

The optional voice field classifies who performs the lyric segment.
RECOMMENDED enumerations:

lead
duet_a
duet_b
group
backing
spoken

Custom values MAY be used if documented by the application.
The optional layer field MAY be used to express overlapping voices or harmonies.

10. Validation Rules (Normative)

An LLTS file is valid if:

The file parses as valid JSON
llts_version is supported
lyrics_hash matches the hash of the normalized external lyrics
All line spans fall within the lyric string length
All timing ranges fall within their referenced line spans
Timing values are monotonically non‑decreasing per layer


9. Security and Copyright Considerations


LLTS files contain no copyrighted text
Hashes are one‑way and non‑reversible
Character ranges are meaningless without the external lyric string
The format does not enable lyric reconstruction

Implementations SHOULD require users to supply lyrics obtained lawfully.

10. Extensibility

Future versions MAY add optional fields, including:

Phoneme class labels (non‑textual)
Confidence scores
Multiple language tracks

No extension may include lyric text unless explicitly licensed.

11. Example (Complex, Normative)

The following example demonstrates all major LLTS features:

Explicit line boundaries
Deterministic normalization assumptions
Multiple voices (lead, duet, backing)
Overlapping layers
Per-letter timing

This example assumes the user supplies the correct external lyrics whose normalized form hashes to the value shown.
{
  "llts_version": "1.0",
  "track": {
    "title": "Example Song",
    "artist": "Example Artist",
    "duration_ms": 215000
  },
  "lyrics_hash": "4d7c2a9e6f3b8c1a0e9f4d2c7b1a6e5f8c9d0a1b2c3d4e5f6a7b8c9d0e1f",
  "hash_algorithm": "SHA-256",
  "normalization": {
    "unicode": "NFC",
    "case": "lower",
    "whitespace": "collapse",
    "line_endings": "lf",
    "punctuation": "remove"
  },

  "lines": [
    {
      "line_id": 0,
      "start_char": 0,
      "end_char": 22,
      "voice": "lead",
      "description": "Verse 1 – lead vocal"
    },
    {
      "line_id": 1,
      "start_char": 23,
      "end_char": 47,
      "voice": "duet_a",
      "description": "Chorus – first singer"
    },
    {
      "line_id": 2,
      "start_char": 23,
      "end_char": 47,
      "voice": "duet_b",
      "description": "Chorus – second singer"
    },
    {
      "line_id": 3,
      "start_char": 48,
      "end_char": 72,
      "voice": "backing",
      "description": "Backing vocals"
    }
  ],

  "timing": [
    {
      "start_char": 0,
      "end_char": 4,
      "start_ms": 1200,
      "end_ms": 1450,
      "line_id": 0,
      "voice": "lead",
      "layer": 0
    },
    {
      "start_char": 4,
      "end_char": 9,
      "start_ms": 1450,
      "end_ms": 1800,
      "line_id": 0,
      "voice": "lead",
      "layer": 0
    },
    {
      "start_char": 23,
      "end_char": 28,
      "start_ms": 32000,
      "end_ms": 33500,
      "line_id": 1,
      "voice": "duet_a",
      "layer": 0
    },
    {
      "start_char": 23,
      "end_char": 28,
      "start_ms": 32000,
      "end_ms": 33500,
      "line_id": 2,
      "voice": "duet_b",
      "layer": 1
    },
    {
      "start_char": 48,
      "end_char": 52,
      "start_ms": 60000,
      "end_ms": 62000,
      "line_id": 3,
      "voice": "backing",
      "layer": 0
    },
    {
      "start_char": 52,
      "end_char": 58,
      "start_ms": 62000,
      "end_ms": 65000,
      "line_id": 3,
      "voice": "backing",
      "layer": 0
    }
  ]
}


12. License

This specification is licensed under the Apache License, Version 2.0 (the "License").
You may not use this specification except in compliance with the License. You may obtain a copy of the License at:
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, this specification is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright © The LLTS Contributors.
No results found