barseghyanartur/PyData_Amsterdam_2025_notes.rst

## PyData_Amsterdam_2025_notes.rst

      
    Raw
  

              PyData_Amsterdam_2025_notes.rst
            
          
    PyData Amsterdam 2025 notes


Prompts


Use DSpy https://dspy.ai/ for automatic prompt optimisation.
Microsoft's attempt to fix the prompt management: https://github.com/microsoft/prompty.
Supported by LangChain.


Zero-copy


Use DLPack https://dmlc.github.io/dlpack/latest/ for zero-data-copy of tensor data between between different libs (PyTorch, NumPy, etc.).
Use Apache Arrow https://arrow.apache.org/ for zero-data-copy of tabular data between different libs (Pandas, Polars, DuckDB, etc.).


End-to-end GPU processing


If you're using NVIDIA GPUs, use cuDF-cu12 https://docs.rapids.ai/api/cudf/stable/


Vector DBs

Qdrant https://github.com/qdrant/qdrant is generally loved by many as a vector DB.

Face-recognition

ArcFace https://github.com/deepinsight/insightface and SigLip2 https://huggingface.co/blog/siglip2 are your best friends here.

Notebooks

Marimo is emerging as a Jupyter notebook replacement.

LLMs


Some suggest to use universal LLM proxies for corporate use. LiteLLM https://docs.litellm.ai/ is one of such. Same folks suggest using Open WebUI as a generally available tool for corporate use.
MCP gets tremendous amount of attention.
Proper (optimal) implementation of RAGs get a lot of attention too. Make sure you're familiar with pre, mid, and post optimisations.
Agents get tremendous amount of attention. New frameworks like Dapr https://github.com/dapr/dapr emerge.
Dutch local governance is actively integrating LLMs into their processes.
Some suggest to use outlines https://github.com/dottxt-ai/outlines for structured output.
Some suggest SmolLM https://github.com/huggingface/smollm or https://ollama.com/library/smollm for development/testing, as it's extremely small and fast. There's SmolVLM too, if you need a multimodal version.


Engineering


Some suggest to use DuckDB https://duckdb.org/why_duckdb.html for querying extremely large datasets.
Docling is generally loved by many for format conversion. Check docling-langchain https://github.com/docling-project/docling-langchain/ for better LangChain integration.
No results found