Skip to content

Instantly share code, notes, and snippets.

@alexeden
Last active May 1, 2019 17:42
Show Gist options
  • Select an option

  • Save alexeden/83f04caf2d435b7486c9bf5abb7351ab to your computer and use it in GitHub Desktop.

Select an option

Save alexeden/83f04caf2d435b7486c9bf5abb7351ab to your computer and use it in GitHub Desktop.
Building a Search Engine

Simple Search Engine

You provide:

  • Data (data) as an array of items (item)
  • Key generator function (keygen): A pure function that, given an item from data, returns a deterministic key as a string.
  • Tokenizer (tokenizer): A pure function that, given an item from data, returns a list of tokens (strings) describing that item.

What the search engine does when created:

  • For each item, invokes keygen and tokenizer to build a forward index
  • Performs an iterative key-value swap transformation to create an inverted index

How it's used:

Given a query (like the string you'd enter into google), the keys of the inverted index can be scanned for possible matches. The values of any matching keys are then cross-referenced via the forward index to get the matching items, which are your search results.

Types:

type Keygen<T> = (item: T) => string;

type Tokenizer<T> = (item: T) => string[];

type InvertedIndex = { 
  [key: string]: string[];
};

type ForwardIndex = { 
  [token: string]: string[]; 
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment