Created
March 12, 2026 10:31
-
-
Save vjcitn/8d4e44a896c7b76e30c29ec72f6e9205 to your computer and use it in GitHub Desktop.
Explore GenomicState+txdbmaker functions for working with Gencode V49
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| library(GenomicState) | |
| library(RSQLite) | |
| library(txdbmaker) | |
| hgenc49 = gencode_txdb( | |
| version = "49", | |
| genome = c("hg38"), | |
| chrs = paste0("chr", c(seq_len(22), "X", "Y", "M")) | |
| ) | |
| hgenc49$conn # 'path' is empty | |
| md = dbGetQuery(hgenc49$conn, "select * from metadata") # lacks 'Resource URL' record in metadata | |
| md = rbind(md, data.frame(name="Resource URL", value = md[3,]$value)) | |
| dbWriteTable(hgenc49$conn, "metadata", md, overwrite=TRUE) | |
| dir.create("/tmp/hgenc49c") # following is minimal, illustrative | |
| makeTxDbPackage(txdb=hgenc49, maintainer=person(given="a", family="b", email="foo@abc.com"), | |
| provider="EBI", author=person(given="a", family="b", email="bar@abc.com"), pkgname="hgenc49", | |
| version="1.0.0", providerVersion="49", destDir="/tmp/hgenc49c") |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Comments
Warning that should be addressed
Note that the makeTxDbPackage call produces
SQLite metadata
It is not clear to me why the "Resource URL" record in the SQLite metadata table is expected
by txdbmaker::makeTxDbPackage, but it seems simple to solve.
Legacy elements deserving attention
The report of the TxDb object is familiar:
The DBSCHEMAVERSION element relates to AnnotationDbi. There are very low-level commitments
that are old, and were devised by a core member who has left the project. In a comment to follow,
some aspects of migrating away from AnnotationDbi practices will be discussed.
Note: the object created before packaging looks like this
Reference class maintenance is not widely practiced in the project at present. Additionally
when packaged the SQLite database consumes over 260MB.
Briefly, I think it is possible that the AnnotationDbi legacy schema "gets in the way" of efficient
lookups and merges that could take advantage of all the information in the GTF, using
modern serializations. Stay tuned for experiments.