dehaenw/UCTPrague submission

## UCTPrague submission
# OpenADMET + ExpansionRx Blind Challenge

This is a short description of our latest submission to the OpenADMET + ExpansionRx Blind Challenge.

This submission was made on behalf of the UCT Prague cheminformatics group. People who contributed to the various submissions of the UCT team are (alphabetically):
- Joanna Ceklarz
- Ivan Čmelo
- Wim Dehaen
- Valeriia Fil
- Jozef Fülöp
- Lukáš Kerti
- Martin Šícho
- Hunzallah Usmani

## Short description

The approach is based on an ensemble of TabPFN regression models combined with a stacked meta-learner (Ridge).
Each ADMET endpoint is modeled independently, with automatic per-endpoint selection of the best target transform (e.g., linear, log, Box-Cox, Yeo-Johnson, asinh, quantile).

!! NOTE: for one of the endpoints, HLM CLint, a parallel completely different approach was taken and this column was spliced in.
The approach for this endpoint was based on an ensemble consisting of MLPs, xgboost and classic features like morgan maccs and rdkit descs.

We use a diverse feature set:

- MOE descriptors

- CheMeleon features

- MORDRED descriptors

- RDKit descriptors

- RDKit Morgan fingerprints (chiral-aware)

Models are trained using Butina clustering for fold splits to reduce scaffold leakage.
After prediction, we apply:

- Post-prediction calibration (linear or isotonic)

- Prediction clipping to training ranges

- Optional multitask residual correction across endpoints

## Performance notes

Extensive feature ensembles + transform search improved robustness across endpoints

No finetuning of TabPFN itself; improvements come from transforms, calibration, ensembling, and residual correction

When the challenge ends, code and a more detailed description will be made public at:
https://github.com/lich-uct/openADMET-challenge


Internal OOF performance is consistently predictive of leaderboard behavior in terms of ranking, and overly optimistic in terms of metrics.
	# OpenADMET + ExpansionRx Blind Challenge

	This is a short description of our latest submission to the OpenADMET + ExpansionRx Blind Challenge.

	This submission was made on behalf of the UCT Prague cheminformatics group. People who contributed to the various submissions of the UCT team are (alphabetically):
	- Joanna Ceklarz
	- Ivan Čmelo
	- Wim Dehaen
	- Valeriia Fil
	- Jozef Fülöp
	- Lukáš Kerti
	- Martin Šícho
	- Hunzallah Usmani

	## Short description

	The approach is based on an ensemble of TabPFN regression models combined with a stacked meta-learner (Ridge).
	Each ADMET endpoint is modeled independently, with automatic per-endpoint selection of the best target transform (e.g., linear, log, Box-Cox, Yeo-Johnson, asinh, quantile).

	!! NOTE: for one of the endpoints, HLM CLint, a parallel completely different approach was taken and this column was spliced in.
	The approach for this endpoint was based on an ensemble consisting of MLPs, xgboost and classic features like morgan maccs and rdkit descs.

	We use a diverse feature set:

	- MOE descriptors

	- CheMeleon features

	- MORDRED descriptors

	- RDKit descriptors

	- RDKit Morgan fingerprints (chiral-aware)

	Models are trained using Butina clustering for fold splits to reduce scaffold leakage.
	After prediction, we apply:

	- Post-prediction calibration (linear or isotonic)

	- Prediction clipping to training ranges

	- Optional multitask residual correction across endpoints

	## Performance notes

	Extensive feature ensembles + transform search improved robustness across endpoints

	No finetuning of TabPFN itself; improvements come from transforms, calibration, ensembling, and residual correction

	When the challenge ends, code and a more detailed description will be made public at:
	https://github.com/lich-uct/openADMET-challenge


	Internal OOF performance is consistently predictive of leaderboard behavior in terms of ranking, and overly optimistic in terms of metrics.
No results found