TheArmagan/fulltextsearch.md

## fulltextsearch.md

      
    Raw
  

              fulltextsearch.md
            
          
    AVTR.ZIP: Full‑Text Avatar Search Guide

Cheat Sheet (TL;DR)

Free text            cyberpunk cat_ears
Exclude              -sex -gore
aiTag (exact)        aiTag:cat_ears | aiTag:any(cat_ears, neon) | aiTag:all(cat_ears, glowing_eyes)
User tag (exact)     tag:any(dragon, fantasy) | tag:all(dragon, fire)
Author               author:alice | by bob
Fuzzy threshold      fuzzy:0.5  (0=loose, 1=near-exact) — Default OFF (substring only)
Sort                 sort:updatedAt(desc) sort:name(asc)
Logical OR           cyberpunk or steampunk
Negated array        -aiTag:any(gore, blood)
Combine              aiTag:any(cyberpunk, neon) -gore fuzzy:0.4 sort:updatedAt(desc)
Platforms (presence) platforms:any(Android, IOS)
Platform level       platforms:Android(Poor) | android:poor
Negated level        -android:poor
Range (levels)       android:>=Medium | platforms:Android(≥Medium)
Performance (any)    performance:Good | performance:>=Medium
Similarity seed      similar to <avatarId>
Random               random:true

Quick Start Examples


Query
Meaning


cyberpunk cat
Must contain both tokens (fast substring match by default; add fuzzy:<v> for trigram fuzzy).


-sex cute
Contains fuzzy “cute” but excludes fuzzy “sex”.


aiTag:any(cyberpunk, neon)
Avatar has at least one of those aiTags.


aiTag:all(mecha, robot)
Avatar has BOTH mecha AND robot aiTags.


tag:any(cat, feline)
Avatar has at least one of those user tags.


author:alice
Author name OR author id fuzzy‑matches alice.


by alice
Same as above using positional by syntax.


cyberpunk or steampunk
Results matching either segment (logical OR).


aiTag:any(cyberpunk, neon) -gore fuzzy:0.4
Has neon/cyberpunk aiTags, excludes gore, fuzzy level ≥ 0.4.


sort:updatedAt(desc) sort:name(asc)
Order first by latest update, then by name ascending.


fuzzy:0.75 cat_ears glowing_eyes
Strict fuzzy (closer to exact) match for both tokens.


platforms:any(Android, IOS)
Avatar mentions either Android or IOS (any level).


android:poor
Requires Android: Poor.


-android:poor
Excludes Android: Poor.


platforms:Android(≥Medium)
Android level is Medium, Good, or Excellent.


android:>=Medium
Same as above (ASCII comparator).


similar to 123e4567-uuid
Results similar to the given avatar id (hybrid similarity).


Tip: You can control fuzzy strictness directly in the query with fuzzy:<0-1>.

Grammar Overview

The query is tokenized on whitespace, then recomposed into logical segments separated by the keywords and / or (case‑insensitive).
Within each segment you can mix:

Positive free‑text tokens: cyberpunk cat_ears
Negative tokens (prefixed with -): -nsfw -gore
Field terms: field:value or field:any(v1,v2,...) / field:all(v1,v2,...)
Inline directives: sort:field(dir) and fuzzy:<0-1>
Author clause using by <author> (must appear once per segment; consumes the remainder after by).

Segments are combined with connectors:

A and B or C → Segment1 AND Segment2 OR Segment3 (left‑to‑right evaluation with connectors applied in order; no parentheses).
Missing connectors between segments default to AND.

Precedence & Evaluation

There is no custom precedence; evaluation is linear across segments with explicit connectors. For complex logic prefer explicit or. Example:
aiTag:any(cyberpunk, neon) and -gore or cute

Interpretation: (Segment1 AND Segment2) OR Segment3.

Fields


Field
Description
Fuzzy?
Array Ops (any/all)


name
Avatar name
Yes
N/A


description
Description text
Yes
N/A


tags
Raw tags text blob
Yes
N/A


aiTags / aiTagsText
Raw aiTags blob
Yes
N/A


author
Matches authorName OR authorId
Yes
N/A


authorName
Author name only
Yes
N/A


authorId
Author id only
Yes
N/A


platform / platforms
Canonical list Platform: Level pairs (e.g. Android: Good, IOS: None)
Yes
N/A


performance
Any-platform performance level (matches if any platform has the level/range)
No
N/A


tag
Uses normalized tagsArray
No fuzzy text; exact via array ops
any/all


aiTag
Uses normalized aiTagsArray
No fuzzy text; exact via array ops
any/all


tag: and aiTag: support three modes:

Single: aiTag:cat_ears → array contains that element.
Any: aiTag:any(cat_ears, cyberpunk, glowing_eyes) → overlaps at least one.
All: aiTag:all(cat_ears, glowing_eyes) → contains both.

All matches are case‑insensitive (you can type upper or lower case freely).
Fallback / Unknown Field

If you specify an unknown field (foo:bar) the system treats the value like a general free‑text token applied across core textual columns (name, description, tags, aiTags, authorName). This is intentional for resilience.

Free‑Text Tokens

Positive tokens must all match (logical AND within the segment). Negative tokens must not match.
Match behavior:

With a fuzzy level (fuzzy:X), higher values mean stricter matches; lower values are more forgiving. When fuzzy is set, the trigram operator (%) is used and accelerated via pg_trgm GIN indexes.
Without a fuzzy level, matching uses fast substring (ILIKE) only for performance (no trigram).
Very short tokens (<3 chars) behave more like exact text; the fuzzy level has limited effect.

Fields searched: name, description, user tags, AI tags, and author name.

Platform Filters

The platforms field stores canonical pairs like StandaloneWindows: Good, Android: Medium, IOS: None.
You can query it in several ways:
• Presence (any level):
- platforms:any(Android, IOS) → requires those platform names to appear.
• Exact level (per platform):
- platforms:Android(Poor)
- android:poor (shorthand pseudo‑field)
• Negated level (exclude a specific level):
- -android:poor
• Range by level (ordered: None < VeryPoor < Poor < Medium < Good < Excellent):
- android:>=Medium
- platforms:Android(≥Medium)
- Comparators supported: >, >=, <, <=, =, and their Unicode forms ≥, ≤.
Notes:
- Ranges expand to acceptable levels (e.g., ≥Medium → Medium | Good | Excellent).
- Presence checks in platforms:any(...) match by platform name only and ignore level.
Performance Filter (Any Platform)

Filter by performance level regardless of platform:

Exact level: performance:Good
Range by level: performance:>=Medium

This matches if any of the platforms (Android, IOS, StandaloneWindows) has the requested level (or falls within the specified range).

Similarity Seed (similar to <avatarId>)

Use this to pivot the full‑text search off an existing avatar's embedding + tags:
similar to 123e4567-uuid aiTag:any(cyberpunk, neon) -gore sort:updatedAt(desc)

Behavior:

Fetches a similarity candidate pool (hybrid vector + tag scoring) for the given id.
Constrains subsequent filters (aiTag:, free tokens, negations, platforms, etc.) to that candidate set.
If no explicit sort: is provided, results are ordered by similarity rank, otherwise your custom sorts apply.
You can still use fuzzy: to refine textual matching within the candidate set.

Notes:

Combine with other segments using and / or like: similar to ... and cute or aiTag:any(steampunk).
If the seed id is not found or yields no candidates, the result is empty.
Returned rows include an optional score field (hybrid similarity) when using this directive.


Negation

Prefix with - to exclude tokens or field terms:

-nsfw → avatar must not fuzzy‑match “nsfw”.
-aiTag:any(gore, blood) → none of those aiTags may appear.

Negated field terms invert their entire condition: NOT ( ... ).

Author Filters

Two styles:

Inline field: author:alice or author:any(alice, bob)
Positional: by alice (everything after by becomes the author term string for that segment).

Author filters match either the author name or the author ID; the fuzzy level applies if you set one.
Examples:

by 1234-uuid — fuzzy match on id (exact if high fuzzy like 0.9).
author:all(alice, smith) — requires both tokens to match (both must satisfy fuzzy or fallback logic).


Tag & aiTag Filters (Exact Matches)

These are exact tag filters designed for precision and speed.
Examples (Danbooru style):

aiTag:cat_ears — requires cat_ears.
aiTag:any(cat_ears, cyberpunk, glowing_eyes) — at least one.
aiTag:all(cat_ears, glowing_eyes) — both present.
tag:any(fantasy, dragon) — user tag overlap.

Because these are exact operations they ignore the fuzzy threshold. If you want fuzzy text matching on AI tags, search the broader text with a free token like cat_ear (without the aiTag: prefix).

Fuzzy Threshold Control

Specify strictness with fuzzy:<value> where <value> ∈ [0,1].

fuzzy:0.25 — loose, recall‑oriented, accepts broader matches.
fuzzy:0.50 — balanced.
fuzzy:0.75 — strict, near exact surface form.
fuzzy:0.90+ — almost exact; great for IDs or unique names.

If omitted, fuzzy is OFF. Matching defaults to substring (ILIKE) only. Set fuzzy:<v> to enable trigram‑based fuzzy matching (% operator) on indexed columns.
Short Tokens

Very short tokens (under 3 characters) act more like exact matching; the fuzzy level has limited effect on them.

Sorting

Inline directive: sort:<field>(asc|desc).

Multiple directives allowed; applied in the order they appear left→right.
Example: aiTag:any(cyberpunk, neon) sort:updatedAt(desc) sort:name(asc).
If no sort provided, default ordering: updatedAt DESC, createdAt DESC.

Sortable fields: name, authorName, authorId, platforms, createdAt, updatedAt, avatarCreatedAt, avatarUpdatedAt.

Pagination

Results may be paginated by your app. Use sort: directives to control ordering; then navigate pages using the interface controls.
Random results (inline or API)

If you include random:true in the query or pass random: true via API options, results are randomized while still respecting all query rules (segments, fields, negations, platforms, even similar to). Random pages are drawn from a cached per‑query pool (up to 5,000 ids) that refreshes roughly every 5–10 minutes to keep results fresh and fast.
Notes:

The random pool is query‑scoped (includes your current filters), so random results won’t be starved by unrelated ids.
Pagination with random is not stable across calls (no guaranteed repeatability without a seed).
totalCount reflects the number of matches within the current random pool, not the entire database.


Logical Grouping with AND / OR

Split on and / or tokens. Example:
aiTag:any(cyberpunk, neon) -gore and cute or aiTag:any(steampunk)

→ ((Segment1 AND Segment2) OR Segment3).
You can simulate parentheses by structuring segments carefully, but native parentheses are not yet supported.

Advanced Examples


Strict author & tag combo:
author:alice aiTag:all(cyberpunk, glowing_eyes) fuzzy:0.8 sort:updatedAt(desc)


Hybrid OR logic:
aiTag:any(cyberpunk, neon) -gore or aiTag:any(steampunk, brass)


Excluding NSFW while requiring style tags:
aiTag:any(cyberpunk, city_lights) -nsfw -gore fuzzy:0.55


Mixed arrays + free text:
aiTag:all(cat_ears, glowing_eyes) cute cyberpunk sort:name(asc)


Deep author search:
by johndoe aiTag:any(neon, rain) fuzzy:0.4 sort:updatedAt(desc)


Complex chaining:
aiTag:any(dragon, fire) -gore and author:any(alex, alice) or aiTag:any(phoenix)


Exact array with fallback fuzzy name:
aiTag:all(mecha, robot) name:any(heavy, armored) fuzzy:0.5


Tips


Typical fuzzy values: 0.35–0.55 (balanced). Use ≥0.75 for strict, near‑exact matches.
Prefer exact array matching with aiTag:any(...) / aiTag:all(...) for Danbooru‑style tags (fast and precise).
Combine multiple sort: directives to fine‑tune ordering.


Edge Cases & Safeguards


Scenario
Behavior


Unknown field
Treat value as general free‑text token set


Empty segment
Ignored (defaults to no-op)


Short token (<3 chars) with fuzzy
Behaves more exact; fuzzy has limited effect


Duplicate inline sort directives
Applied in appearance order; duplicates allowed


Multiple fuzzy directives
The later one wins


Negated array with all
Ensures NOT all present (full condition inverted)


Mixed case tags
Treated the same (search is case‑insensitive)
Query	Meaning
`cyberpunk cat`	Must contain both tokens (fast substring match by default; add `fuzzy:<v>` for trigram fuzzy).
`-sex cute`	Contains fuzzy “cute” but excludes fuzzy “sex”.
`aiTag:any(cyberpunk, neon)`	Avatar has at least one of those aiTags.
`aiTag:all(mecha, robot)`	Avatar has BOTH `mecha` AND `robot` aiTags.
`tag:any(cat, feline)`	Avatar has at least one of those user tags.
`author:alice`	Author name OR author id fuzzy‑matches `alice`.
`by alice`	Same as above using positional `by` syntax.
`cyberpunk or steampunk`	Results matching either segment (logical OR).
`aiTag:any(cyberpunk, neon) -gore fuzzy:0.4`	Has neon/cyberpunk aiTags, excludes gore, fuzzy level ≥ 0.4.
`sort:updatedAt(desc) sort:name(asc)`	Order first by latest update, then by name ascending.
`fuzzy:0.75 cat_ears glowing_eyes`	Strict fuzzy (closer to exact) match for both tokens.
`platforms:any(Android, IOS)`	Avatar mentions either Android or IOS (any level).
`android:poor`	Requires Android: Poor.
`-android:poor`	Excludes Android: Poor.
`platforms:Android(≥Medium)`	Android level is Medium, Good, or Excellent.
`android:>=Medium`	Same as above (ASCII comparator).
`similar to 123e4567-uuid`	Results similar to the given avatar id (hybrid similarity).
Field	Description	Fuzzy?	Array Ops (any/all)
`name`	Avatar name	Yes	N/A
`description`	Description text	Yes	N/A
`tags`	Raw tags text blob	Yes	N/A
`aiTags` / `aiTagsText`	Raw aiTags blob	Yes	N/A
`author`	Matches authorName OR authorId	Yes	N/A
`authorName`	Author name only	Yes	N/A
`authorId`	Author id only	Yes	N/A
`platform` / `platforms`	Canonical list `Platform: Level` pairs (e.g. `Android: Good, IOS: None`)	Yes	N/A
`performance`	Any-platform performance level (matches if any platform has the level/range)	No	N/A
`tag`	Uses normalized `tagsArray`	No fuzzy text; exact via array ops	any/all
`aiTag`	Uses normalized `aiTagsArray`	No fuzzy text; exact via array ops	any/all
Scenario	Behavior
Unknown field	Treat value as general free‑text token set
Empty segment	Ignored (defaults to no-op)
Short token (<3 chars) with fuzzy	Behaves more exact; fuzzy has limited effect
Duplicate inline sort directives	Applied in appearance order; duplicates allowed
Multiple fuzzy directives	The later one wins
Negated array with `all`	Ensures NOT all present (full condition inverted)
Mixed case tags	Treated the same (search is case‑insensitive)