Skip to content

Instantly share code, notes, and snippets.

@vjcitn
Created March 12, 2026 10:31
Show Gist options
  • Select an option

  • Save vjcitn/8d4e44a896c7b76e30c29ec72f6e9205 to your computer and use it in GitHub Desktop.

Select an option

Save vjcitn/8d4e44a896c7b76e30c29ec72f6e9205 to your computer and use it in GitHub Desktop.
Explore GenomicState+txdbmaker functions for working with Gencode V49
library(GenomicState)
library(RSQLite)
library(txdbmaker)
hgenc49 = gencode_txdb(
version = "49",
genome = c("hg38"),
chrs = paste0("chr", c(seq_len(22), "X", "Y", "M"))
)
hgenc49$conn # 'path' is empty
md = dbGetQuery(hgenc49$conn, "select * from metadata") # lacks 'Resource URL' record in metadata
md = rbind(md, data.frame(name="Resource URL", value = md[3,]$value))
dbWriteTable(hgenc49$conn, "metadata", md, overwrite=TRUE)
dir.create("/tmp/hgenc49c") # following is minimal, illustrative
makeTxDbPackage(txdb=hgenc49, maintainer=person(given="a", family="b", email="foo@abc.com"),
provider="EBI", author=person(given="a", family="b", email="bar@abc.com"), pkgname="hgenc49",
version="1.0.0", providerVersion="49", destDir="/tmp/hgenc49c")
@vjcitn
Copy link
Author

vjcitn commented Mar 12, 2026

Comments

Warning that should be addressed

Note that the makeTxDbPackage call produces

Warning messages:
1: In .get_cds_IDX(mcols0$type, mcols0$phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
2: In .makeTxDb_normarg_chrominfo(chrominfo) :
  genome version information is not available for this TxDb object

SQLite metadata

It is not clear to me why the "Resource URL" record in the SQLite metadata table is expected
by txdbmaker::makeTxDbPackage, but it seems simple to solve.

Legacy elements deserving attention

The report of the TxDb object is familiar:

TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/gencode.v49.annotation.gtf.gz
# Organism: Homo sapiens
# Taxonomy ID: 9606
# Genome: hg38
# Nb of transcripts: 507365
# Db created by: txdbmaker package from Bioconductor
# Creation time: 2026-03-12 06:28:34 -0400 (Thu, 12 Mar 2026)
# txdbmaker version at creation time: 1.6.2
# RSQLite version at creation time: 2.4.6
# DBSCHEMAVERSION: 1.2
# Resource URL: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/gencode.v49.annotation.gtf.gz

The DBSCHEMAVERSION element relates to AnnotationDbi. There are very low-level commitments
that are old, and were devised by a core member who has left the project. In a comment to follow,
some aspects of migrating away from AnnotationDbi practices will be discussed.

Note: the object created before packaging looks like this

> str(hgenc49)
Reference class 'TxDb' [package "GenomicFeatures"] with 6 fields
 $ conn           :Formal class 'SQLiteConnection' [package "RSQLite"] with 8 slots
  .. ..@ ptr                :<externalptr> 
  .. ..@ dbname             : chr ""
  .. ..@ loadable.extensions: logi TRUE
  .. ..@ flags              : int 70
  .. ..@ vfs                : chr ""
  .. ..@ ref                :<environment: 0x164081d90> 
  .. ..@ bigint             : chr "integer64"
  .. ..@ extended_types     : logi FALSE
 $ packageName    : chr(0) 
 $ user2seqlevels0: int [1:25] 1 2 3 4 5 6 7 8 9 10 ...
 $ user_seqlevels : chr [1:25] "chr1" "chr2" "chr3" "chr4" ...
 $ user_genome    : chr [1:25] "hg38" "hg38" "hg38" "hg38" ...
 $ isActiveSeq    : logi [1:25] TRUE TRUE TRUE TRUE TRUE TRUE ...
 and 16 methods, of which 2 are  possibly relevant:
   finalize, initialize

Reference class maintenance is not widely practiced in the project at present. Additionally
when packaged the SQLite database consumes over 260MB.

Briefly, I think it is possible that the AnnotationDbi legacy schema "gets in the way" of efficient
lookups and merges that could take advantage of all the information in the GTF, using
modern serializations. Stay tuned for experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment