Skip to content

Instantly share code, notes, and snippets.

@arenagroove
Last active March 9, 2026 10:00
Show Gist options
  • Select an option

  • Save arenagroove/f2c47797673ab2237f3f75a7227671f2 to your computer and use it in GitHub Desktop.

Select an option

Save arenagroove/f2c47797673ab2237f3f75a7227671f2 to your computer and use it in GitHub Desktop.
WordPress MU plugin — unified llms.txt + per-page .md endpoint for LLM content serving. Lazy-generated Markdown with HTML→MD converter, shared exclusions, YAML frontmatter, Polylang support, rate limiting, and a single settings page. Built for Classic Editor + ACF sites using the Less Rain framework.
<?php
/**
* Plugin Name: LR LLMs Generator
* Description: Generates a multilingual llms.txt for LLM indexing and serves clean .md endpoints for every public post/page. Unified settings under Tools.
* Version: 3.4.2
* Author: Luis Martinez
* Author URI: https://www.lessrain.com
* Requires at least: 5.6
* Tested up to: 6.5
* Requires PHP: 7.4
*
* Changelog:
* 3.4.2 — 2026-03-09
* - Homepage URLs in llms.txt now correctly output as index.md / en/index.md
* instead of site.com.md — detected by comparing against home_url() and
* Polylang language home URLs
* - get_option() comparisons hardened with (string) cast throughout — fixes
* strict === '1' check failing when WP stores option as integer
* 3.4 — 2026-03-09
* - llms.txt entry URLs now point to .md endpoints when the .md feature is
* enabled — AIs follow links directly to clean Markdown instead of HTML.
* Falls back to canonical HTML URLs when .md is disabled.
* 3.3 — 2026-03-09
* - Self-closing block elements (<div />, <section />, etc.) now normalized to explicit
* open+close pairs before DOMDocument parsing — root cause of content duplication and
* missing sections on pages with swiper/page-builder markup
* - Include-class whitelist check moved from parent-node lookup to per-text-node ancestor
* walk — text inside deeply nested or structurally irregular markup now correctly
* included if any ancestor carries the include class (e.g. text-container)
* - $is_included closure removed; logic inlined in DOMText handler
* - form removed from $skip_tags — form containers often wrap meaningful text; form
* controls (input, textarea, select, button, etc.) suppressed via separate $skip_form_tags
* - lr_llms_md_skip_tags filter added — skip tags list is now overridable per-site
* - Title encoding fixed: get_post_field('post_title', $id, 'raw') replaces get_the_title()
* to bypass Polylang filter chain re-encoding (& → &#038;)
* - lr_llms_clean_text() helper: html_entity_decode with ENT_QUOTES|ENT_HTML5 + soft
* hyphen (U+00AD) stripping
* - Multilingual .md canonical URL: lr_llms_md_get_requested_content_url() builds frontend
* URL from request path, preserving language prefixes (/en/, /de/, etc.)
* - index.md resolution extended to language-prefixed paths (/en/index.md, /de/index.md)
* 3.2 — 2026-03-08
* - Fixed broken cron purge LIKE query — was searching wrong prefix, silently deleting
* nothing; now correctly targets _transient_ and _transient_timeout_ rows
* - sanitize_url() → esc_url_raw() for canonical WP core alignment
* - apply_filters() wired up in lr_llms_md_get_skip_classes() and
* lr_llms_md_get_include_classes() — documented filters were never actually called
* - Added status_header(200) before Content-Type in lr_llms_md_serve() for correct
* behaviour under reverse proxies
* - Rate limit message updated to match actual 30s timeout
* 3.1 — 2026-03-08
* - Headings (h1–h6) now extracted regardless of include-class whitelist — picks up
* visually-hidden landmark headings outside text-container
* - Removed explicit # Title prepend — h1 is sourced from the DOM instead
* - YAML frontmatter title and description values are now properly quoted to handle
* colons and special characters in YAML parsers
* - Polylang null guard on cache key — pll_current_language('slug') falls back to
* get_locale() when called during early bootstrap
* - Added lr-no-extract-md CSS class as a dedicated .md-only skip signal (editor-side)
* - Added media-overlay-stack to default skip classes (hero overlay text)
* - SVG images skipped in converter (always decorative icons in LR sites)
* - Excluded/missing .md requests now return to WordPress for proper 404 template rendering
*
* 3.0 — Initial unified release
* - Merged lr-llms-txt-generator.php and lr-llms-md-generator.php into single plugin
* - Shared exclusions (post types, IDs, page templates) across both endpoints
* - Lazy .md generation with transient cache keyed by post_id + post_modified hash
* - YAML frontmatter on .md output (title, url, last_modified, type, description)
* - Language-aware cache keys for Polylang / WPML
* - Rate limiting with 30s timeout for paginated crawlers
* - Custom user-agent on internal wp_remote_get self-fetch
* - lr_llms_md_html_source filter for site-specific HTML pre-processing
* - lr_llms_md_output filter for final Markdown post-processing
* - Single admin settings page under Tools > LR LLMs Settings
*/
if ( ! defined( 'ABSPATH' ) ) {
exit;
}
// =============================================================================
// A. LLMS.TXT — CONFIGURATION
// =============================================================================
global $lr_llms_config;
$lr_llms_config = [
'prefix' => 'lr_llms',
// Shared option keys (also used by .md)
'setting_key_exclude_types' => 'lr_llms_exclude_post_types',
'setting_key_exclude_ids' => 'lr_llms_exclude_ids',
'setting_key_exclude_templates' => 'lr_llms_exclude_templates',
// llms.txt display options
'setting_key_show_headings' => 'lr_llms_show_headings',
'setting_key_show_descriptions' => 'lr_llms_show_descriptions',
// Caching
'cache_prefix' => 'lr_llms_cache_',
'cache_timeout' => HOUR_IN_SECONDS,
'transient_flag_flush' => 'lr_llms_flush_needed',
// Rate limiting
'rate_limit_timeout' => 30, // seconds — short enough to not block paginated crawlers, long enough to deter abuse
'rate_limit_http_status' => 429,
// Output control
'post_type_priority_order' => [ 'page', 'post' ],
'max_items' => 1000,
'max_length_chars' => 3000,
'default_page_size' => 200,
// Query parameters
'query_param_flush' => 'flush',
// REST API
'rest_namespace' => 'lr-llms/v1',
'rest_txt_route' => 'txt',
// Cron
'cron_event' => 'lr_llms_purge_transients_hook',
'cron_frequency' => 'daily',
// Dev flags
'dev_purge_enabled' => defined( 'LR_LLMS_DEV_PURGE' ) && LR_LLMS_DEV_PURGE,
'disable_rate_limit' => defined( 'LR_LLMS_DISABLE_RATE_LIMIT' ) && LR_LLMS_DISABLE_RATE_LIMIT,
// HTTP headers for llms.txt response
'headers_txt_response' => [
'Content-Type' => 'text/plain; charset=utf-8',
'X-Robots-Tag' => 'index, follow',
'Cache-Control' => 'no-store, must-revalidate',
'Pragma' => 'no-cache',
'Expires' => '0',
],
// Filter hooks
'filter_included_post_ids' => 'lr_llms_included_post_ids',
'filter_post_type_priority' => 'lr_llms_post_type_priority_order',
'filter_contact_details' => 'lr_llms_contact_details',
'filter_post_type_label' => 'lr_llms_post_type_label',
'filter_post_title' => 'lr_llms_post_title',
'filter_post_url' => 'lr_llms_post_url',
'filter_post_description' => 'lr_llms_post_description',
];
// =============================================================================
// B. MD ENDPOINT — CONFIGURATION
// =============================================================================
define( 'LR_LLMS_MD_OPT_ENABLED', 'lr_llms_md_enabled' );
define( 'LR_LLMS_MD_OPT_SKIP_CLASSES', 'lr_llms_md_skip_classes' );
define( 'LR_LLMS_MD_OPT_INC_CLASSES', 'lr_llms_md_include_classes' );
define( 'LR_LLMS_MD_OPT_LAZY_LOAD', 'lr_llms_md_lazy_load' );
define( 'LR_LLMS_MD_OPT_DEDUP_LINKS', 'lr_llms_md_dedup_links' );
// Default skip classes — tuned for LR framework sites
define( 'LR_LLMS_MD_DEFAULT_SKIP', implode( "\n", [
'lr-no-extract-text',
'lr-no-extract-md',
'navigation-bar',
'main-footer',
'offcanvas-footer',
'site-header',
'site-footer',
'breadcrumb',
'pagination',
'cookie-banner',
'button-container',
'call-to-action',
'img-link',
'meta',
'terms',
'subtitle',
'caption',
'media-overlay-stack',
] ) );
// Default include classes — text-container is the standard LR content wrapper
define( 'LR_LLMS_MD_DEFAULT_INC', 'text-container' );
// =============================================================================
// C. SHARED HELPERS
// =============================================================================
/**
* Get shared excluded post types (applies to both llms.txt and .md).
*/
function lr_llms_get_excluded_types(): array {
global $lr_llms_config;
return (array) get_option( $lr_llms_config['setting_key_exclude_types'], [] );
}
/**
* Get shared excluded post IDs (applies to both llms.txt and .md).
*/
function lr_llms_get_excluded_ids(): array {
global $lr_llms_config;
$raw = get_option( $lr_llms_config['setting_key_exclude_ids'], '' );
return array_map( 'intval', array_filter( array_map( 'trim', explode( ',', $raw ) ) ) );
}
/**
* Get excluded page templates (applies to llms.txt queries).
*/
function lr_llms_get_excluded_templates(): array {
global $lr_llms_config;
return (array) get_option( $lr_llms_config['setting_key_exclude_templates'], [] );
}
/**
* Get all registered page templates across all public post types.
* Returns [ 'template-file.php' => 'Template Name' ]
*/
function lr_llms_get_all_page_templates(): array {
$templates = [];
$post_types = get_post_types( [ 'public' => true ], 'names' );
$theme = wp_get_theme();
foreach ( $post_types as $pt ) {
$pt_templates = $theme->get_page_templates( null, $pt );
foreach ( $pt_templates as $file => $name ) {
if ( ! isset( $templates[ $file ] ) ) {
$templates[ $file ] = $name;
}
}
}
asort( $templates );
return $templates;
}
/**
* Parse a newline-separated textarea value into a clean array.
*/
function lr_llms_parse_lines( string $raw ): array {
return array_values( array_filter( array_map( 'trim', explode( "\n", $raw ) ) ) );
}
/**
* Decode HTML entities and strip soft hyphens from WP text fields.
* get_the_title() and get_the_excerpt() return HTML-encoded strings intended
* for HTML output — this produces clean plain text for llms.txt and .md.
*/
function lr_llms_clean_text( string $text ): string {
$decoded = html_entity_decode( $text, ENT_QUOTES | ENT_HTML5, 'UTF-8' );
return str_replace( "\u{00AD}", '', $decoded ); // strip soft hyphens
}
// =============================================================================
// D. SHARED CACHE
// =============================================================================
/**
* Flush all llms.txt transient caches.
*/
function lr_llms_flush_txt_cache(): void {
global $wpdb, $lr_llms_config;
$val_like = $wpdb->esc_like( '_transient_' . $lr_llms_config['cache_prefix'] ) . '%';
$time_like = $wpdb->esc_like( '_transient_timeout_' . $lr_llms_config['cache_prefix'] ) . '%';
$wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->options} WHERE option_name LIKE %s", $val_like ) );
$wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->options} WHERE option_name LIKE %s", $time_like ) );
delete_transient( $lr_llms_config['transient_flag_flush'] );
}
/**
* Flush all .md transient caches.
*/
function lr_llms_flush_md_cache(): void {
global $wpdb;
$val_like = $wpdb->esc_like( '_transient_lr_llms_md_' ) . '%';
$time_like = $wpdb->esc_like( '_transient_timeout_lr_llms_md_' ) . '%';
$wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->options} WHERE option_name LIKE %s", $val_like ) );
$wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->options} WHERE option_name LIKE %s", $time_like ) );
}
/**
* Flush all caches (both llms.txt and .md).
*/
function lr_llms_flush_all_caches(): void {
lr_llms_flush_txt_cache();
lr_llms_flush_md_cache();
}
/**
* Flag llms.txt cache for flush after post save.
*/
function lr_llms_flag_cache_flush_on_save( int $post_id ): void {
if ( ( defined( 'DOING_AUTOSAVE' ) && DOING_AUTOSAVE ) || wp_is_post_revision( $post_id ) ) {
return;
}
global $lr_llms_config;
set_transient( $lr_llms_config['transient_flag_flush'], true, 5 * MINUTE_IN_SECONDS );
}
add_action( 'save_post', 'lr_llms_flag_cache_flush_on_save' );
/**
* Check flush flag on init and clear if set.
*/
add_action( 'init', function () {
global $lr_llms_config;
if ( get_transient( $lr_llms_config['transient_flag_flush'] ) ) {
lr_llms_flush_txt_cache();
}
} );
// =============================================================================
// E. LLMS.TXT — CORE
// =============================================================================
/**
* Generate a cache key from the current query string and active language.
*
* Language context is injected explicitly to prevent Polylang/WPML cookie-
* switched requests from sharing cache across languages when $_GET is identical.
*/
function lr_llms_get_cache_key( ?array $get = null ): string {
global $lr_llms_config;
if ( $get === null ) {
$get = $_GET;
}
unset(
$get[ $lr_llms_config['query_param_flush'] ],
$get['fbclid'],
$get['utm_source'],
$get['utm_medium'],
$get['utm_campaign']
);
// Include active language so cookie-switched Polylang/WPML requests
// don't collide with each other when query params are otherwise identical.
// pll_current_language() can return null early in bootstrap — fall back to get_locale().
$lang = function_exists( 'pll_current_language' ) ? pll_current_language( 'slug' ) : null;
$get['_lang'] = $lang ?: get_locale();
ksort( $get );
return md5( http_build_query( $get ) );
}
/**
* Check if the current request is for /llms.txt.
*/
function lr_llms_is_llms_txt_request(): bool {
$path = parse_url( $_SERVER['REQUEST_URI'], PHP_URL_PATH );
$path = untrailingslashit( strtolower( $path ) );
if ( defined( 'LR_LLMS_LOCAL_PATH_SUFFIX' ) ) {
return str_ends_with( $path, '/' . ltrim( LR_LLMS_LOCAL_PATH_SUFFIX, '/' ) );
}
return ( $path === '/llms.txt' );
}
/**
* Build the base URL for the Next Page hint.
*/
if ( ! function_exists( 'lr_llms_build_base_url' ) ) {
function lr_llms_build_base_url( array $cfg ): string {
$path = parse_url( $_SERVER['REQUEST_URI'] ?? '', PHP_URL_PATH );
if ( is_string( $path ) && substr( $path, -9 ) === '/llms.txt' ) {
return home_url( '/llms.txt' );
}
return rest_url( rtrim( $cfg['rest_namespace'] ?? '', '/' ) . '/txt' );
}
}
/**
* Intercept /llms.txt requests.
*/
function lr_llms_handle_llms_txt_request(): void {
if ( ! lr_llms_is_llms_txt_request() || is_admin() ) {
return;
}
lr_llms_serve_txt_output();
exit;
}
add_action( 'init', 'lr_llms_handle_llms_txt_request', 99 );
/**
* Serve llms.txt output with caching and rate limiting.
*/
function lr_llms_serve_txt_output(): void {
global $lr_llms_config;
foreach ( $lr_llms_config['headers_txt_response'] as $key => $value ) {
header( "{$key}: {$value}" );
}
// Rate limiting
if ( ! $lr_llms_config['disable_rate_limit'] ) {
$ip = $_SERVER['REMOTE_ADDR'] ?? 'unknown';
$key = $lr_llms_config['prefix'] . '_rl_' . md5( $ip );
if ( get_transient( $key ) ) {
status_header( $lr_llms_config['rate_limit_http_status'] );
echo "Too many requests – please retry shortly.";
exit;
}
set_transient( $key, 1, $lr_llms_config['rate_limit_timeout'] );
}
$max_items = $lr_llms_config['max_items'];
$limit = isset( $_GET['limit'] ) ? min( max( 1, intval( $_GET['limit'] ) ), $max_items ) : $lr_llms_config['default_page_size'];
$page = isset( $_GET['page'] ) ? max( 1, intval( $_GET['page'] ) ) : 1;
$since = isset( $_GET['since'] ) ? sanitize_text_field( $_GET['since'] ) : null;
$tag = isset( $_GET['tag'] ) ? sanitize_text_field( $_GET['tag'] ) : null;
$lang = isset( $_GET['lang'] ) ? sanitize_text_field( $_GET['lang'] ) : null;
$offset = ( $page - 1 ) * $limit;
if ( ! empty( $_GET[ $lr_llms_config['query_param_flush'] ] ) ) {
lr_llms_flush_txt_cache();
}
$cache_key = $lr_llms_config['cache_prefix'] . lr_llms_get_cache_key();
$cached_output = get_transient( $cache_key );
if ( $cached_output ) {
echo $cached_output;
return;
}
lr_llms_generate_output_body( $limit, $page, $offset, $since, $tag, $lang, $cache_key );
}
/**
* REST API fallback for llms.txt output.
*/
function lr_llms_rest_output(): WP_REST_Response {
ob_start();
lr_llms_serve_txt_output();
$output = ob_get_clean();
global $lr_llms_config;
if ( mb_strlen( $output ) > $lr_llms_config['max_length_chars'] ) {
$output = mb_substr( $output, 0, $lr_llms_config['max_length_chars'] );
}
return new WP_REST_Response( $output, 200, [
'Content-Type' => 'text/plain; charset=utf-8',
'X-Robots-Tag' => 'index, follow',
] );
}
add_action( 'rest_api_init', function () {
global $lr_llms_config;
register_rest_route( $lr_llms_config['rest_namespace'], $lr_llms_config['rest_txt_route'], [
'methods' => 'GET',
'callback' => 'lr_llms_rest_output',
'permission_callback' => '__return_true',
] );
} );
/**
* Get post types ordered by priority config.
*/
function lr_llms_get_ordered_post_types(): array {
global $lr_llms_config;
$priority_order = apply_filters(
$lr_llms_config['filter_post_type_priority'],
$lr_llms_config['post_type_priority_order'] ?? [ 'page', 'post' ]
);
$all = get_post_types( [ 'public' => true ], 'names' );
$ordered = array_unique( array_merge( $priority_order, $all ) );
usort( $ordered, function ( $a, $b ) use ( $priority_order ) {
$ai = array_search( $a, $priority_order );
$bi = array_search( $b, $priority_order );
return ( $ai !== false ? $ai : PHP_INT_MAX ) - ( $bi !== false ? $bi : PHP_INT_MAX );
} );
return $ordered;
}
/**
* Generate and output the llms.txt body.
*/
function lr_llms_generate_output_body(
int $limit,
int $page,
int $offset,
?string $since,
?string $tag,
?string $lang,
string $cache_key
): void {
global $lr_llms_config;
$has_more = false;
// Shared exclusions
$excluded_types = lr_llms_get_excluded_types();
$excluded_ids = lr_llms_get_excluded_ids();
$excluded_templates = lr_llms_get_excluded_templates();
$show_headings = (string) get_option( $lr_llms_config['setting_key_show_headings'], '0' ) === '1';
$show_descriptions = (string) get_option( $lr_llms_config['setting_key_show_descriptions'], '0' ) === '1';
$post_types = array_diff( lr_llms_get_ordered_post_types(), $excluded_types );
// Language detection
if ( function_exists( 'pll_languages_list' ) ) {
$languages = pll_languages_list();
$lang_type = 'polylang';
} elseif ( function_exists( 'icl_get_languages' ) ) {
$wpml_langs = apply_filters( 'wpml_active_languages', null, [ 'skip_missing' => 0 ] );
$languages = array_keys( $wpml_langs ?: [] );
$lang_type = 'wpml';
} else {
$languages = [ null ];
$lang_type = null;
}
if ( $lang && in_array( $lang, $languages, true ) ) {
$languages = [ $lang ];
}
if ( $lang_type === null ) {
$locale = get_locale();
$default_lang = strstr( $locale, '_', true ) ?: $locale;
} else {
$default_lang = function_exists( 'pll_default_language' )
? pll_default_language()
: ( defined( 'LR_DEFAULT_LANGUAGE' ) ? LR_DEFAULT_LANGUAGE : 'en' );
}
ob_start();
echo '# LLMs.txt — Generated by ' . esc_html( get_bloginfo( 'name' ) ) . "\n";
echo '# Site: ' . esc_url( home_url() ) . "\n";
echo '# Updated: ' . esc_html( current_time( 'c' ) ) . "\n";
echo '# Page: ' . esc_html( "{$page} / Per-Type Limit: {$limit}" ) . "\n";
echo '# Purpose: Lists public, indexable content for LLM indexing and retrieval.' . "\n";
echo '# Customize: WP Admin > Tools > LR LLMs Settings' . "\n\n";
$contact = apply_filters( $lr_llms_config['filter_contact_details'], '' );
if ( ! empty( $contact ) ) {
echo $contact;
}
echo "## Sitemap\n\n";
echo '- XML: ' . esc_url( home_url( '/sitemap.xml' ) ) . "\n";
foreach ( $languages as $lang ) {
$display_lang = $lang ?: $default_lang;
echo "\n## Language: " . strtoupper( $display_lang ) . "\n";
foreach ( $post_types as $type ) {
$type_obj = get_post_type_object( $type );
if ( ! $type_obj ) {
continue;
}
if ( $show_headings ) {
$label = apply_filters( $lr_llms_config['filter_post_type_label'], $type_obj->labels->name, $type, $lang );
echo "\n### {$label}\n\n";
}
$args = [
'post_type' => $type,
'post_status' => 'publish',
'has_password' => false,
'posts_per_page' => $limit,
'offset' => $offset,
'orderby' => 'date',
'order' => 'DESC',
'no_found_rows' => true,
'fields' => 'ids',
'update_post_meta_cache' => false,
'update_post_term_cache' => false,
'ignore_sticky_posts' => ( $type === 'post' ),
'suppress_filters' => false,
];
if ( $since ) {
$args['date_query'] = [ [
'after' => $since,
'inclusive' => true,
'column' => 'post_date_gmt',
] ];
}
if ( $tag ) {
$args['tag'] = $tag;
}
if ( $lang_type && $lang ) {
$args['lang'] = $lang;
}
$query = new WP_Query( $args );
// Probe for next page
$probe_args = $args;
$probe_args['posts_per_page'] = 1;
$probe_args['offset'] = $offset + $limit;
$probe = new WP_Query( $probe_args );
if ( ! empty( $probe->posts ) ) {
$has_more = true;
}
$ids = array_diff( $query->posts, $excluded_ids );
// Filter by excluded page templates
if ( ! empty( $excluded_templates ) ) {
$ids = array_filter( $ids, function ( $post_id ) use ( $excluded_templates ) {
$tpl = get_page_template_slug( $post_id );
return ! in_array( $tpl, $excluded_templates, true );
} );
}
$ids = apply_filters( $lr_llms_config['filter_included_post_ids'], $ids, $args, $lang );
$md_active = lr_llms_md_is_enabled();
foreach ( $ids as $post_id ) {
$title = apply_filters( $lr_llms_config['filter_post_title'], lr_llms_clean_text( get_post_field( 'post_title', $post_id, 'raw' ) ?: '(untitled)' ), $post_id, $lang );
$permalink = get_permalink( $post_id );
// Homepages must become index.md — detect by matching home_url() exactly
// e.g. https://site.com/ → index.md, https://site.com/en/ → en/index.md
$home_url = trailingslashit( home_url() );
$is_home = trailingslashit( $permalink ) === $home_url;
// Language-prefixed homepages: https://site.com/en/
if ( ! $is_home && function_exists( 'pll_languages_list' ) ) {
foreach ( pll_languages_list() as $l ) {
if ( trailingslashit( $permalink ) === trailingslashit( home_url( '/' . $l ) ) ) {
$is_home = true;
break;
}
}
}
$url = $md_active
? ( $is_home
? trailingslashit( $permalink ) . 'index.md'
: rtrim( $permalink, '/' ) . '.md' )
: $permalink;
$url = esc_url_raw( apply_filters( $lr_llms_config['filter_post_url'], $url, $post_id, $lang ) );
$last_modified = get_the_modified_time( 'c', $post_id );
echo '- [' . $title . '](' . $url . ')' . "\n";
echo 'Last-Modified: ' . $last_modified . "\n";
if ( $show_descriptions ) {
$description = apply_filters( $lr_llms_config['filter_post_description'], lr_llms_clean_text( get_the_excerpt( $post_id ) ), $post_id );
if ( ! empty( $description ) ) {
echo 'Description: ' . $description . "\n";
}
}
}
wp_reset_postdata();
}
}
$output = ob_get_clean();
if ( $has_more ) {
$base = lr_llms_build_base_url( $lr_llms_config );
$get_args = $_GET ?? [];
unset( $get_args['page'], $get_args['paged'] );
$output .= "\nNext Page: " . add_query_arg( array_merge( $get_args, [ 'page' => $page + 1 ] ), $base ) . "\n";
}
set_transient( $cache_key, $output, $lr_llms_config['cache_timeout'] );
echo $output;
}
// Cron purge
add_action( 'init', function () {
global $lr_llms_config, $wpdb;
add_action( $lr_llms_config['cron_event'], function () use ( $lr_llms_config, $wpdb ) {
$val_like = $wpdb->esc_like( '_transient_' . $lr_llms_config['cache_prefix'] ) . '%';
$time_like = $wpdb->esc_like( '_transient_timeout_' . $lr_llms_config['cache_prefix'] ) . '%';
$wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->options} WHERE option_name LIKE %s", $val_like ) );
$wpdb->query( $wpdb->prepare( "DELETE FROM {$wpdb->options} WHERE option_name LIKE %s", $time_like ) );
} );
if ( ! wp_next_scheduled( $lr_llms_config['cron_event'] ) ) {
wp_schedule_event( time(), $lr_llms_config['cron_frequency'], $lr_llms_config['cron_event'] );
}
if ( $lr_llms_config['dev_purge_enabled'] ) {
$timestamp = wp_next_scheduled( $lr_llms_config['cron_event'] );
if ( $timestamp ) {
wp_unschedule_event( $timestamp, $lr_llms_config['cron_event'] );
error_log( '[LR LLMs] Purged scheduled event: ' . $lr_llms_config['cron_event'] );
}
}
} );
// =============================================================================
// F. MD ENDPOINT — CORE
// =============================================================================
function lr_llms_md_is_enabled(): bool {
return (string) get_option( LR_LLMS_MD_OPT_ENABLED, '1' ) === '1';
}
function lr_llms_md_get_skip_classes(): array {
return apply_filters( 'lr_llms_md_skip_classes',
lr_llms_parse_lines( get_option( LR_LLMS_MD_OPT_SKIP_CLASSES, LR_LLMS_MD_DEFAULT_SKIP ) )
);
}
function lr_llms_md_get_include_classes(): array {
return apply_filters( 'lr_llms_md_include_classes',
lr_llms_parse_lines( get_option( LR_LLMS_MD_OPT_INC_CLASSES, LR_LLMS_MD_DEFAULT_INC ) )
);
}
/**
* Resolve a .md request URI to a post ID.
* Handles trailing slashes and subdirectory installs.
*/
function lr_llms_md_resolve_post_id(): int {
$path = parse_url( $_SERVER['REQUEST_URI'] ?? '', PHP_URL_PATH );
if ( ! is_string( $path ) ) {
return 0;
}
$path = rtrim( $path, '/' );
// Strip subdirectory prefix
$home_path = rtrim( parse_url( home_url(), PHP_URL_PATH ) ?: '', '/' );
$relative = ( $home_path !== '' && str_starts_with( $path, $home_path ) )
? substr( $path, strlen( $home_path ) )
: $path;
if ( substr( $relative, -3 ) !== '.md' ) {
return 0;
}
// Homepage alias — handles /index.md and /{lang}/index.md
if ( preg_match( '#^/?(?:[a-z]{2}/)?index\.md$#i', ltrim( $relative, '/' ) ) ) {
$page_on_front = (int) get_option( 'page_on_front' );
return $page_on_front > 0 ? $page_on_front : 0;
}
$real_relative = trailingslashit( substr( $relative, 0, -3 ) );
return (int) url_to_postid( home_url( $real_relative ) );
}
/**
* Build the frontend URL from the requested .md path,
* preserving language prefixes like /en/.
*/
function lr_llms_md_get_requested_content_url(): string {
$path = parse_url( $_SERVER['REQUEST_URI'] ?? '', PHP_URL_PATH );
if ( ! is_string( $path ) ) {
return home_url( '/' );
}
$path = rtrim( $path, '/' );
// Strip WP subdirectory if installed in one
$home_path = rtrim( parse_url( home_url(), PHP_URL_PATH ) ?: '', '/' );
if ( $home_path && str_starts_with( $path, $home_path ) ) {
$path = substr( $path, strlen( $home_path ) );
}
// Root homepage alias: /index.md
if ( $path === '/index.md' || $path === 'index.md' ) {
return home_url( '/' );
}
// Language-prefixed homepage alias: /en/index.md → /en/
if ( str_ends_with( $path, '/index.md' ) ) {
return home_url( substr( $path, 0, -8 ) ); // strip 'index.md', dir already has leading slash
}
// Strip .md and append trailing slash
$path = preg_replace( '/\.md$/', '', $path );
return home_url( trailingslashit( $path ) );
}
/**
* Intercept .md requests.
*/
function lr_llms_md_handle_request(): void {
if ( is_admin() || ! lr_llms_md_is_enabled() ) {
return;
}
$post_id = lr_llms_md_resolve_post_id();
if ( $post_id === 0 ) {
return;
}
$post = get_post( $post_id );
if ( ! $post || $post->post_status !== 'publish' || ! empty( $post->post_password ) ) {
return; // Let WordPress render its own 404 template
}
// Shared exclusions
if ( in_array( $post->post_type, lr_llms_get_excluded_types(), true ) ) {
return;
}
if ( in_array( $post_id, lr_llms_get_excluded_ids(), true ) ) {
return;
}
// Template exclusions
$excluded_templates = lr_llms_get_excluded_templates();
if ( ! empty( $excluded_templates ) && in_array( get_page_template_slug( $post_id ), $excluded_templates, true ) ) {
return;
}
lr_llms_md_serve( $post );
exit;
}
add_action( 'init', 'lr_llms_md_handle_request', 98 );
/**
* Serve the .md response.
*/
function lr_llms_md_serve( WP_Post $post ): void {
$post_id = $post->ID;
$canonical_url = lr_llms_md_get_requested_content_url();
// Cache key auto-invalidates when post_modified changes
$cache_key = 'lr_llms_md_' . $post_id . '_' . md5( $post->post_modified );
$md = get_transient( $cache_key );
if ( $md === false ) {
$md = lr_llms_md_generate( $post_id, $canonical_url );
set_transient( $cache_key, $md, HOUR_IN_SECONDS * 24 );
}
$md = apply_filters( 'lr_llms_md_output', $md, $post_id );
status_header( 200 );
header( 'Content-Type: text/plain; charset=utf-8' );
header( 'X-Robots-Tag: noindex' );
header( 'Cache-Control: no-store, must-revalidate' );
header( 'Link: <' . esc_url_raw( $canonical_url ) . '>; rel="canonical"' );
echo $md;
}
/**
* Fetch rendered HTML and generate Markdown for a post.
*/
function lr_llms_md_generate( int $post_id, string $url ): string {
$post = get_post( $post_id );
$args = [
'timeout' => 15,
// Identify the internal self-request so WAFs and access logs don't
// misinterpret it as an external scraper or trigger Cloudflare rules.
'user-agent' => 'LR-LLMs-Markdown-Extractor/1.0 (WordPress; +' . home_url() . ')',
];
if ( function_exists( 'lr_is_localhost' ) && lr_is_localhost() ) {
$args['sslverify'] = false;
}
$response = wp_remote_get( $url, $args );
if ( is_wp_error( $response ) ) {
error_log( '[lr-llms-md] HTTP fetch failed for post ' . $post_id . ': ' . $response->get_error_message() );
$html = '<main>' . apply_filters( 'the_content', $post->post_content ) . '</main>';
} else {
$html = wp_remote_retrieve_body( $response );
}
/**
* Filter the raw HTML before it is passed to the Markdown converter.
*
* Use this to strip site-specific elements (hero blocks, ad slots, widgets)
* that are not covered by the skip-classes list, or to pre-process markup
* before DOM parsing begins.
*
* @param string $html Full rendered HTML of the page.
* @param int $post_id
*/
$html = apply_filters( 'lr_llms_md_html_source', $html, $post_id );
// Build YAML-style frontmatter — recognised by Jekyll, Hugo, and many LLM tools.
$title = lr_llms_clean_text( get_post_field( 'post_title', $post_id, 'raw' ) );
$last_modified = get_the_modified_time( 'c', $post_id );
$post_type = get_post_type( $post_id );
$excerpt = lr_llms_clean_text( wp_strip_all_tags( get_the_excerpt( $post_id ) ) );
$yaml_str = function ( string $val ): string {
return '"' . str_replace( [ '\\', '"' ], [ '\\\\', '\\"' ], $val ) . '"';
};
$frontmatter = [ '---' ];
$frontmatter[] = 'title: ' . $yaml_str( $title );
$frontmatter[] = 'url: ' . $url;
$frontmatter[] = 'last_modified: ' . $last_modified;
$frontmatter[] = 'type: ' . ( $post_type ?: 'page' );
if ( $excerpt ) {
$frontmatter[] = 'description: ' . $yaml_str( $excerpt );
}
$frontmatter[] = '---';
$lines = [];
$lines[] = implode( "\n", $frontmatter );
$lines[] = '';
$lines[] = lr_llms_md_html_to_markdown( $html );
return implode( "\n", $lines );
}
// =============================================================================
// G. HTML → MARKDOWN CONVERTER
// =============================================================================
function lr_llms_md_html_to_markdown( string $html ): string {
$skip_classes = lr_llms_md_get_skip_classes();
$include_classes = lr_llms_md_get_include_classes();
$use_lazy_load = (string) get_option( LR_LLMS_MD_OPT_LAZY_LOAD, '1' ) === '1';
$dedup_links = (string) get_option( LR_LLMS_MD_OPT_DEDUP_LINKS, '1' ) === '1';
// 'form' intentionally excluded — form containers often wrap meaningful text content
// (e.g. event registration sections with headings and instructions).
// Form controls are suppressed separately via $skip_form_tags.
$skip_tags = apply_filters( 'lr_llms_md_skip_tags', [
'script', 'style', 'noscript', 'nav', 'header', 'footer', 'iframe', 'svg',
] );
$skip_form_tags = [ 'input', 'textarea', 'select', 'option', 'button', 'label', 'fieldset', 'datalist' ];
// Normalize self-closing block elements — e.g. <div class="foo" /> produced by some
// page builders / ACF renderers. These are valid XHTML but break libxml's HTML parser:
// it treats them as open tags and then mismatches subsequent closing tags, corrupting
// the entire DOM tree. Convert to explicit open+close pairs before parsing.
$html = preg_replace(
'#<(div|section|article|aside|header|footer|nav|span|picture)([^>]*)/>#i',
'<$1$2></$1>',
$html
);
$dom = new DOMDocument();
libxml_use_internal_errors( true );
$dom->loadHTML( '<?xml encoding="utf-8" ?>' . $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
libxml_clear_errors();
$xpath = new DOMXPath( $dom );
$root = $xpath->query( '//main' )->item( 0 ) ?? $xpath->query( '//body' )->item( 0 );
if ( ! $root ) {
return wp_strip_all_tags( $html );
}
$get_classes = function ( DOMElement $el ): array {
return array_filter( array_map( 'trim', preg_split( '/\s+/', $el->getAttribute( 'class' ) ) ) );
};
$should_skip = function ( DOMNode $node ) use ( $skip_classes, $skip_tags, $skip_form_tags, $get_classes ): bool {
if ( ! $node instanceof DOMElement ) return false;
if ( in_array( strtolower( $node->tagName ), $skip_tags, true ) ) return true;
if ( in_array( strtolower( $node->tagName ), $skip_form_tags, true ) ) return true;
$current = $node;
while ( $current instanceof DOMElement ) {
if ( array_intersect( $skip_classes, $get_classes( $current ) ) ) return true;
$current = $current->parentNode instanceof DOMElement ? $current->parentNode : null;
}
return false;
};
$seen_hrefs = [];
$convert = null;
$children_md = function ( DOMNode $node ) use ( &$convert ): string {
$out = '';
foreach ( $node->childNodes as $child ) {
$out .= $convert( $child );
}
return $out;
};
$convert = function ( DOMNode $node ) use (
&$convert, &$children_md, &$seen_hrefs,
$should_skip, $include_classes, $get_classes,
$use_lazy_load, $dedup_links
): string {
if ( $node instanceof DOMText ) {
$text = preg_replace( '/\s+/', ' ', $node->nodeValue );
if ( ! empty( $include_classes ) ) {
$parent = $node->parentNode;
$included = false;
while ( $parent instanceof DOMElement ) {
if ( array_intersect( $include_classes, $get_classes( $parent ) ) ) {
$included = true;
break;
}
$parent = $parent->parentNode;
}
if ( ! $included ) return '';
}
return $text;
}
if ( ! $node instanceof DOMElement ) return '';
if ( $should_skip( $node ) ) return '';
$tag = strtolower( $node->tagName );
$inner = $children_md( $node );
$inner_t = trim( $inner );
switch ( $tag ) {
case 'h1': $ht = trim( $node->textContent ); return $ht !== '' ? "\n\n# {$ht}\n\n" : '';
case 'h2': $ht = trim( $node->textContent ); return $ht !== '' ? "\n\n## {$ht}\n\n" : '';
case 'h3': $ht = trim( $node->textContent ); return $ht !== '' ? "\n\n### {$ht}\n\n" : '';
case 'h4': $ht = trim( $node->textContent ); return $ht !== '' ? "\n\n#### {$ht}\n\n" : '';
case 'h5': $ht = trim( $node->textContent ); return $ht !== '' ? "\n\n##### {$ht}\n\n" : '';
case 'h6': $ht = trim( $node->textContent ); return $ht !== '' ? "\n\n###### {$ht}\n\n" : '';
case 'p':
return $inner_t !== '' ? "\n\n{$inner_t}\n\n" : '';
case 'blockquote':
if ( $inner_t === '' ) return '';
return "\n\n" . implode( "\n", array_map( fn( $l ) => '> ' . $l, explode( "\n", trim( $inner ) ) ) ) . "\n\n";
case 'pre':
return "\n\n```\n" . $node->textContent . "\n```\n\n";
case 'code':
if ( $node->parentNode instanceof DOMElement && strtolower( $node->parentNode->tagName ) === 'pre' ) return $inner;
return $inner_t !== '' ? "`{$inner_t}`" : '';
case 'strong':
case 'b':
return $inner_t !== '' ? "**{$inner_t}**" : '';
case 'em':
case 'i':
return $inner_t !== '' ? "_{$inner_t}_" : '';
case 'a': {
$href = trim( $node->getAttribute( 'href' ) );
// Non-content hrefs — output text only
if (
$href === '' || $href === '#' ||
str_starts_with( $href, 'javascript:' ) ||
str_starts_with( $href, 'mailto:' ) ||
str_starts_with( $href, 'tel:' )
) {
return $inner_t;
}
if ( $dedup_links ) {
if ( isset( $seen_hrefs[ $href ] ) ) return $inner_t;
$seen_hrefs[ $href ] = true;
}
if ( $inner_t === '' || $inner_t === $href ) return $href;
return "[{$inner_t}]({$href})";
}
case 'img': {
$alt = trim( $node->getAttribute( 'alt' ) );
if ( $alt === '' ) return '';
$src = trim( $node->getAttribute( 'src' ) );
if ( $use_lazy_load ) {
foreach ( [ 'data-src', 'data-lazy-src', 'data-lazy', 'data-original' ] as $attr ) {
$lazy = trim( $node->getAttribute( $attr ) );
if ( $lazy !== '' && ! str_starts_with( $lazy, 'data:' ) ) {
$src = $lazy;
break;
}
}
}
if ( $src === '' || str_starts_with( $src, 'data:' ) ) return '';
// SVGs are always decorative icons in LR sites — never content images
if ( strtolower( pathinfo( parse_url( $src, PHP_URL_PATH ), PATHINFO_EXTENSION ) ) === 'svg' ) return '';
return "\n\n![{$alt}]({$src})\n\n";
}
case 'ul': {
$items = '';
foreach ( $node->childNodes as $child ) {
if ( $child instanceof DOMElement && strtolower( $child->tagName ) === 'li' ) {
$li = trim( $children_md( $child ) );
if ( $li !== '' ) $items .= "- {$li}\n";
}
}
return $items !== '' ? "\n\n{$items}\n" : '';
}
case 'ol': {
$items = '';
$i = 1;
foreach ( $node->childNodes as $child ) {
if ( $child instanceof DOMElement && strtolower( $child->tagName ) === 'li' ) {
$li = trim( $children_md( $child ) );
if ( $li !== '' ) { $items .= "{$i}. {$li}\n"; $i++; }
}
}
return $items !== '' ? "\n\n{$items}\n" : '';
}
case 'li': return $children_md( $node );
case 'hr': return "\n\n---\n\n";
case 'br': return "\n\n"; // soft breaks collapsed to paragraphs — " \n" trailing spaces are eaten by trim()
case 'table': return "\n\n" . $inner . "\n\n";
case 'tr': return $inner . "\n";
case 'td':
case 'th': return $inner . ' ';
default: return $inner;
}
};
$raw = $convert( $root );
$lines = explode( "\n", $raw );
$out = [];
$in_code = false;
$prev_blank = false;
foreach ( $lines as $line ) {
if ( str_starts_with( trim( $line ), '```' ) ) {
$in_code = ! $in_code;
}
if ( $in_code ) {
$out[] = $line;
$prev_blank = false;
continue;
}
$trimmed = trim( $line );
$is_blank = ( $trimmed === '' );
if ( $is_blank && $prev_blank ) continue;
$out[] = $trimmed;
$prev_blank = $is_blank;
}
return trim( implode( "\n", $out ) );
}
// =============================================================================
// H. ADMIN — UNIFIED SETTINGS PAGE
// =============================================================================
/**
* Register single settings page under Tools.
*/
add_action( 'admin_menu', function () {
$basename = basename( $_SERVER['PHP_SELF'] ?? '' );
if ( in_array( $basename, [ 'post.php', 'post-new.php', 'edit.php' ], true ) ) {
return;
}
add_management_page(
'LR LLMs Settings',
'LR LLMs Settings',
'manage_options',
'lr-llms-settings',
'lr_llms_render_settings_page'
);
} );
/**
* Save all settings.
*/
add_action( 'admin_init', function () {
if (
! current_user_can( 'manage_options' ) ||
! isset( $_POST['lr_llms_save'] ) ||
! check_admin_referer( 'lr_llms_action', 'lr_llms_nonce' )
) {
return;
}
global $lr_llms_config;
// Shared exclusions
update_option( $lr_llms_config['setting_key_exclude_types'],
array_map( 'sanitize_key', $_POST['lr_llms_exclude_post_types'] ?? [] )
);
update_option( $lr_llms_config['setting_key_exclude_ids'],
sanitize_text_field( $_POST['lr_llms_exclude_ids'] ?? '' )
);
update_option( $lr_llms_config['setting_key_exclude_templates'],
array_map( 'sanitize_text_field', $_POST['lr_llms_exclude_templates'] ?? [] )
);
// llms.txt display
update_option( $lr_llms_config['setting_key_show_headings'], isset( $_POST['lr_llms_show_headings'] ) ? '1' : '0' );
update_option( $lr_llms_config['setting_key_show_descriptions'], isset( $_POST['lr_llms_show_descriptions'] ) ? '1' : '0' );
// .md settings
update_option( LR_LLMS_MD_OPT_ENABLED, isset( $_POST['lr_llms_md_enabled'] ) ? '1' : '0' );
update_option( LR_LLMS_MD_OPT_LAZY_LOAD, isset( $_POST['lr_llms_md_lazy_load'] ) ? '1' : '0' );
update_option( LR_LLMS_MD_OPT_DEDUP_LINKS, isset( $_POST['lr_llms_md_dedup_links'] ) ? '1' : '0' );
update_option( LR_LLMS_MD_OPT_SKIP_CLASSES, sanitize_textarea_field( $_POST['lr_llms_md_skip_classes'] ?? '' ) );
update_option( LR_LLMS_MD_OPT_INC_CLASSES, sanitize_textarea_field( $_POST['lr_llms_md_include_classes'] ?? '' ) );
lr_llms_flush_all_caches();
wp_safe_redirect( add_query_arg( 'lr_llms_saved', '1', admin_url( 'tools.php?page=lr-llms-settings' ) ) );
exit;
} );
/**
* Flush all caches handler.
*/
add_action( 'admin_init', function () {
if (
! current_user_can( 'manage_options' ) ||
! isset( $_POST['lr_llms_flush'] ) ||
! check_admin_referer( 'lr_llms_action', 'lr_llms_nonce' )
) {
return;
}
lr_llms_flush_all_caches();
wp_safe_redirect( add_query_arg( 'lr_llms_flushed', '1', admin_url( 'tools.php?page=lr-llms-settings' ) ) );
exit;
} );
/**
* Render the unified settings page.
*/
function lr_llms_render_settings_page(): void {
if ( ! current_user_can( 'manage_options' ) ) {
return;
}
global $lr_llms_config;
// Load all current values
$excluded_types = lr_llms_get_excluded_types();
$excluded_ids = get_option( $lr_llms_config['setting_key_exclude_ids'], '' );
$excluded_templates = lr_llms_get_excluded_templates();
$show_headings = (string) get_option( $lr_llms_config['setting_key_show_headings'], '0' ) === '1';
$show_descriptions = (string) get_option( $lr_llms_config['setting_key_show_descriptions'], '0' ) === '1';
$md_enabled = (string) get_option( LR_LLMS_MD_OPT_ENABLED, '1' ) === '1';
$md_lazy_load = (string) get_option( LR_LLMS_MD_OPT_LAZY_LOAD, '1' ) === '1';
$md_dedup_links = (string) get_option( LR_LLMS_MD_OPT_DEDUP_LINKS, '1' ) === '1';
$md_skip_classes = get_option( LR_LLMS_MD_OPT_SKIP_CLASSES, LR_LLMS_MD_DEFAULT_SKIP );
$md_inc_classes = get_option( LR_LLMS_MD_OPT_INC_CLASSES, LR_LLMS_MD_DEFAULT_INC );
$all_post_types = get_post_types( [ 'public' => true ], 'objects' );
$all_templates = lr_llms_get_all_page_templates();
$saved = isset( $_GET['lr_llms_saved'] );
$flushed = isset( $_GET['lr_llms_flushed'] );
?>
<div class="wrap">
<h1>LR LLMs Settings</h1>
<?php if ( $saved ) : ?>
<div class="notice notice-success is-dismissible"><p>Settings saved and cache flushed.</p></div>
<?php endif; ?>
<?php if ( $flushed ) : ?>
<div class="notice notice-success is-dismissible"><p>All caches flushed.</p></div>
<?php endif; ?>
<?php /* ============================================================
SECTION 1 — LLMs.txt
============================================================ */ ?>
<h2 class="title" style="margin-top:1.5em;">LLMs.txt</h2>
<p class="description" style="max-width:740px;">
Generates a dynamic <code>llms.txt</code> index for LLM crawlers.
Paginated, multilingual (Polylang &amp; WPML), and customizable via filters.
</p>
<table class="widefat" style="max-width:740px; margin:1em 0; border-collapse:collapse;">
<tbody>
<tr>
<td style="padding:8px 12px; width:160px; font-weight:600; color:#3c434a;">Public URL</td>
<td style="padding:8px 12px;">
<a href="<?php echo esc_url( home_url( '/llms.txt' ) ); ?>" target="_blank" rel="noopener">
<code><?php echo esc_html( home_url( '/llms.txt' ) ); ?></code>
</a>
</td>
</tr>
<tr style="background:#f9f9f9;">
<td style="padding:8px 12px; font-weight:600; color:#3c434a;">Force refresh</td>
<td style="padding:8px 12px;">
<code><?php echo esc_html( home_url( '/llms.txt?flush=1' ) ); ?></code>
</td>
</tr>
<tr>
<td style="padding:8px 12px; font-weight:600; color:#3c434a;">robots.txt tip</td>
<td style="padding:8px 12px;">
<code>LLMs: <?php echo esc_html( home_url( '/llms.txt' ) ); ?></code>
</td>
</tr>
</tbody>
</table>
<form method="post">
<?php wp_nonce_field( 'lr_llms_action', 'lr_llms_nonce' ); ?>
<table class="form-table" role="presentation">
<tr>
<th scope="row">Display options</th>
<td>
<label style="display:block; margin-bottom:6px;">
<input type="checkbox" name="lr_llms_show_headings" value="1" <?php checked( $show_headings ); ?>>
Show section heading for each content type
</label>
<label style="display:block;">
<input type="checkbox" name="lr_llms_show_descriptions" value="1" <?php checked( $show_descriptions ); ?>>
Show description (excerpt) for each entry
</label>
</td>
</tr>
</table>
<?php /* ============================================================
SECTION 2 — .md Endpoint
============================================================ */ ?>
<hr style="margin:2em 0; border:none; border-top:1px solid #dcdcde;">
<h2 class="title" style="display:flex; align-items:center; gap:10px;">
.md Endpoint
<span style="font-size:12px; font-weight:400; padding:2px 8px; border-radius:3px;
background:<?php echo $md_enabled ? '#d7f7c2' : '#f0f0f1'; ?>;
color:<?php echo $md_enabled ? '#1a6b1a' : '#757575'; ?>;">
<?php echo $md_enabled ? 'Active' : 'Inactive'; ?>
</span>
</h2>
<p class="description" style="max-width:740px;">
Appends a clean <code>.md</code> endpoint to every public post and page.
AI crawlers can request the Markdown version directly — navigation, scripts, and layout markup stripped.
Cache auto-invalidates on post save.
</p>
<table class="widefat" style="max-width:740px; margin:1em 0; border-collapse:collapse;">
<tbody>
<tr>
<td style="padding:8px 12px; width:160px; font-weight:600; color:#3c434a;">Example</td>
<td style="padding:8px 12px;">
<a href="<?php echo esc_url( home_url( '/your-post-slug.md' ) ); ?>" target="_blank" rel="noopener">
<code><?php echo esc_html( home_url( '/your-post-slug.md' ) ); ?></code>
</a>
</td>
</tr>
<tr style="background:#f9f9f9;">
<td style="padding:8px 12px; font-weight:600; color:#3c434a;">Homepage</td>
<td style="padding:8px 12px;">
<a href="<?php echo esc_url( home_url( '/index.md' ) ); ?>" target="_blank" rel="noopener">
<code><?php echo esc_html( home_url( '/index.md' ) ); ?></code>
</a>
<span class="description"> — requires a static front page</span>
</td>
</tr>
</tbody>
</table>
<table class="form-table" role="presentation">
<tr>
<th scope="row">Status</th>
<td>
<label>
<input type="checkbox" name="lr_llms_md_enabled" value="1" <?php checked( $md_enabled ); ?>>
Enable <code>.md</code> endpoint
</label>
<p class="description">Uncheck to disable entirely. Cache is flushed on save.</p>
</td>
</tr>
<tr>
<th scope="row">Lazy-load images</th>
<td>
<label>
<input type="checkbox" name="lr_llms_md_lazy_load" value="1" <?php checked( $md_lazy_load ); ?>>
Resolve real image URL from <code>data-src</code> / <code>data-lazy-src</code>
</label>
<p class="description">
Enable if your theme lazy-loads images (lazySizes, etc.).<br>
Checks <code>data-src</code>, <code>data-lazy-src</code>, <code>data-lazy</code>, <code>data-original</code> in that order.
SVG placeholders and data URIs are always skipped.
</p>
</td>
</tr>
<tr>
<th scope="row">Deduplicate links</th>
<td>
<label>
<input type="checkbox" name="lr_llms_md_dedup_links" value="1" <?php checked( $md_dedup_links ); ?>>
Output each unique URL only once per page
</label>
<p class="description">
Prevents repeated links when cards have multiple anchors pointing to the same URL
(image + title + CTA all linking to the same post).
</p>
</td>
</tr>
<tr>
<th scope="row" style="vertical-align:top; padding-top:14px;">Skip classes</th>
<td>
<textarea name="lr_llms_md_skip_classes" rows="10" class="large-text code"
style="font-size:12px; line-height:1.7; max-width:500px;"
><?php echo esc_textarea( $md_skip_classes ); ?></textarea>
<p class="description">
One CSS class per line. Elements with these classes — or nested inside them — are excluded from output.<br>
Tip: add <code>lr-no-extract-md</code> to any ACF layout block in the editor to exclude it from <code>.md</code> output only.
</p>
</td>
</tr>
<tr>
<th scope="row" style="vertical-align:top; padding-top:14px;">
Include classes
<br><span style="font-weight:400; font-size:11px; color:#757575;">whitelist</span>
</th>
<td>
<textarea name="lr_llms_md_include_classes" rows="5" class="large-text code"
style="font-size:12px; line-height:1.7; max-width:500px;"
placeholder="Leave empty to include all content inside &lt;main&gt;"
><?php echo esc_textarea( $md_inc_classes ); ?></textarea>
<p class="description">
One CSS class per line. <strong>When set</strong>, only content inside these elements is extracted.<br>
Default is <code>text-container</code> — the standard LR framework content wrapper.<br>
Clear to extract everything inside <code>&lt;main&gt;</code>.
</p>
</td>
</tr>
</table>
<?php /* ============================================================
SECTION 3 — Shared Exclusions
============================================================ */ ?>
<hr style="margin:2em 0; border:none; border-top:1px solid #dcdcde;">
<h2 class="title">Exclusions</h2>
<p class="description" style="max-width:740px;">
These settings apply to both <code>llms.txt</code> and the <code>.md</code> endpoint.
</p>
<table class="form-table" role="presentation">
<tr>
<th scope="row" style="vertical-align:top; padding-top:8px;">Post types</th>
<td>
<?php foreach ( $all_post_types as $pt ) : ?>
<label style="display:block; margin-bottom:4px;">
<input type="checkbox"
name="lr_llms_exclude_post_types[]"
value="<?php echo esc_attr( $pt->name ); ?>"
<?php checked( in_array( $pt->name, $excluded_types, true ) ); ?>>
<?php echo esc_html( sprintf( '%s (%s)', $pt->label, $pt->name ) ); ?>
</label>
<?php endforeach; ?>
<p class="description" style="margin-top:8px;">Checked post types return 404 for <code>.md</code> and are omitted from <code>llms.txt</code>.</p>
</td>
</tr>
<?php if ( ! empty( $all_templates ) ) : ?>
<tr>
<th scope="row" style="vertical-align:top; padding-top:8px;">Page templates</th>
<td>
<?php foreach ( $all_templates as $file => $name ) : ?>
<label style="display:block; margin-bottom:4px;">
<input type="checkbox"
name="lr_llms_exclude_templates[]"
value="<?php echo esc_attr( $file ); ?>"
<?php checked( in_array( $file, $excluded_templates, true ) ); ?>>
<?php echo esc_html( $name ); ?>
<span class="description">— <code><?php echo esc_html( $file ); ?></code></span>
</label>
<?php endforeach; ?>
<p class="description" style="margin-top:8px;">Pages using these templates are excluded from both <code>llms.txt</code> and <code>.md</code>.</p>
</td>
</tr>
<?php endif; ?>
<tr>
<th scope="row">Specific post IDs</th>
<td>
<input type="text"
name="lr_llms_exclude_ids"
value="<?php echo esc_attr( $excluded_ids ); ?>"
class="regular-text"
placeholder="e.g. 12, 45, 103">
<p class="description">Comma-separated post IDs. Excluded from both <code>llms.txt</code> and <code>.md</code>.</p>
</td>
</tr>
</table>
<p class="submit">
<input type="submit" name="lr_llms_save" class="button-primary" value="Save Settings">
</p>
</form>
<?php /* ============================================================
SECTION 4 — Cache
============================================================ */ ?>
<hr style="margin:1em 0; border:none; border-top:1px solid #dcdcde;">
<h2 class="title">Cache</h2>
<p class="description" style="max-width:740px; margin-bottom:1em;">
<code>llms.txt</code> cache expires in 1 hour and flushes automatically on post save.<br>
<code>.md</code> cache is per-post, keyed by <code>post_modified</code> — auto-invalidates on save.<br>
Use the button below after changing settings, skip/include classes, or template exclusions.
</p>
<form method="post">
<?php wp_nonce_field( 'lr_llms_action', 'lr_llms_nonce' ); ?>
<input type="submit" name="lr_llms_flush" class="button-secondary" value="Flush All Caches">
</form>
</div>
<?php
}
// =============================================================================
// I. DEVELOPER REFERENCE — FILTERS & CONSTANTS
// =============================================================================
//
// ─── LLMS.TXT FILTERS ────────────────────────────────────────────────────────
//
// lr_llms_contact_details
// Inject a contact/about block at the top of llms.txt, after the header.
// Return a string (must include its own trailing newline).
// @param string $details Empty string by default.
// @return string
//
// add_filter( 'lr_llms_contact_details', function ( $details ) {
// return "Contact: hello@example.com\nTwitter: @example\n\n";
// } );
//
// ─────────────────────────────────────────────────────────────────────────────
//
// lr_llms_post_type_priority_order
// Override the order in which post types appear in llms.txt.
// Types not in the array are appended in registration order.
// @param array $order Default: ['page', 'post']
// @return array
//
// add_filter( 'lr_llms_post_type_priority_order', function ( $order ) {
// return [ 'post', 'page', 'product' ];
// } );
//
// ─────────────────────────────────────────────────────────────────────────────
//
// lr_llms_included_post_ids
// Final filter on the array of post IDs included in llms.txt for a given
// post type and language. Useful for adding, removing, or reordering IDs.
// @param array $ids Array of post IDs about to be output.
// @param array $args The WP_Query args used to fetch them.
// @param string $lang Active language code, or null for monolingual sites.
// @return array
//
// add_filter( 'lr_llms_included_post_ids', function ( $ids, $args, $lang ) {
// // Remove a specific ID regardless of other settings
// return array_diff( $ids, [ 99 ] );
// }, 10, 3 );
//
// ─────────────────────────────────────────────────────────────────────────────
//
// lr_llms_post_type_label
// Customise the section heading label for a post type when headings are on.
// @param string $label Default: post type plural label.
// @param string $type Post type slug.
// @param string $lang Active language code.
// @return string
//
// add_filter( 'lr_llms_post_type_label', function ( $label, $type, $lang ) {
// if ( $type === 'product' ) return 'Our Products';
// return $label;
// }, 10, 3 );
//
// ─────────────────────────────────────────────────────────────────────────────
//
// lr_llms_post_title
// Override the title string for a specific post in llms.txt.
// @param string $title Current title.
// @param int $post_id
// @param string $lang
// @return string
//
// add_filter( 'lr_llms_post_title', function ( $title, $post_id, $lang ) {
// if ( $post_id === 42 ) return 'Custom Title for Post 42';
// return $title;
// }, 10, 3 );
//
// ─────────────────────────────────────────────────────────────────────────────
//
// lr_llms_post_url
// Override the URL for a specific post in llms.txt.
// @param string $url
// @param int $post_id
// @param string $lang
// @return string
//
// add_filter( 'lr_llms_post_url', function ( $url, $post_id, $lang ) {
// return $url; // e.g. swap to a CDN or canonical override
// }, 10, 3 );
//
// ─────────────────────────────────────────────────────────────────────────────
//
// lr_llms_post_description
// Override the description string for a post when descriptions are enabled.
// @param string $description Default: post excerpt.
// @param int $post_id
// @return string
//
// add_filter( 'lr_llms_post_description', function ( $description, $post_id ) {
// return get_field( 'seo_summary', $post_id ) ?: $description;
// }, 10, 2 );
//
//
// ─── MD ENDPOINT FILTERS ─────────────────────────────────────────────────────
//
// lr_llms_md_html_source
// Filter the raw HTML before it is passed to the Markdown converter.
// Runs after wp_remote_get / the_content fallback, before DOMDocument parsing.
// Use this to strip site-specific elements not covered by skip-classes, or to
// fix malformed markup before the DOM parser sees it.
// @param string $html Full rendered HTML of the page.
// @param int $post_id
// @return string
//
// add_filter( 'lr_llms_md_html_source', function ( $html, $post_id ) {
// // Strip a site-specific hero section by marker comment
// $html = preg_replace( '/<!-- hero-start -->.*?<!-- hero-end -->/s', '', $html );
// return $html;
// }, 10, 2 );
//
// ─────────────────────────────────────────────────────────────────────────────
//
// lr_llms_md_output
// Filter the final Markdown string before it is sent to the browser.
// Runs after caching — result is NOT stored; fires on every request.
// @param string $md Full Markdown output.
// @param int $post_id
// @return string
//
// add_filter( 'lr_llms_md_output', function ( $md, $post_id ) {
// return $md . "\n\n---\nGenerated by Example Site\n";
// }, 10, 2 );
//
// ─────────────────────────────────────────────────────────────────────────────
//
// lr_llms_md_skip_classes (array filter)
// Programmatically extend the skip class list without touching the UI.
// Merges with whatever is saved in settings.
// @param array $classes
// @return array
//
// add_filter( 'lr_llms_md_skip_classes', function ( $classes ) {
// $classes[] = 'my-custom-nav';
// $classes[] = 'promo-banner';
// return $classes;
// } );
//
// Note: this filter must be wired to the getter. Current getter is:
// function lr_llms_md_get_skip_classes(): array {
// return lr_llms_parse_lines( get_option( LR_LLMS_MD_OPT_SKIP_CLASSES, LR_LLMS_MD_DEFAULT_SKIP ) );
// }
// Add apply_filters( 'lr_llms_md_skip_classes', $result ) there if needed.
//
//
// lr_llms_md_skip_tags (array filter)
// Override the hardcoded HTML tag skip list for the Markdown converter.
// Default: script, style, noscript, nav, header, footer, iframe, svg
// Note: 'form' is intentionally NOT in the default list — form containers
// often wrap meaningful text content (headings, instructions). Form control
// elements (input, textarea, select, button etc.) are always skipped
// separately and are not affected by this filter.
// @param array $tags Lowercase tag names.
// @return array
//
// add_filter( 'lr_llms_md_skip_tags', function ( $tags ) {
// $tags[] = 'aside'; // skip sidebar asides
// return $tags;
// } );
//
//
// ─── WP-CONFIG CONSTANTS ─────────────────────────────────────────────────────
//
// LR_LLMS_DEV_PURGE (bool)
// Unschedules the daily cron purge event. Useful during local development
// to prevent unexpected transient cleanup.
// define( 'LR_LLMS_DEV_PURGE', true );
//
// LR_LLMS_DISABLE_RATE_LIMIT (bool)
// Disables IP-based rate limiting on /llms.txt. Use in dev or CI only.
// define( 'LR_LLMS_DISABLE_RATE_LIMIT', true );
//
// LR_LLMS_LOCAL_PATH_SUFFIX (string)
// Enables suffix-based matching for /llms.txt on localhost subdirectory
// installs where the path includes the project slug.
// define( 'LR_LLMS_LOCAL_PATH_SUFFIX', 'llms.txt' );
// Do NOT use in production.
//
// LR_DEFAULT_LANGUAGE (string)
// Fallback language code when Polylang/WPML is not active.
// define( 'LR_DEFAULT_LANGUAGE', 'en' );
//
// =============================================================================
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment