chetanyagoyal/GSOC_progress_report.md

## GSOC_progress_report.md

      
    Raw
  

              GSOC_progress_report.md
            
          
    GSoC 2024 Progress Report

Relational Framework for Prompt-based Automatic Layout Generation with Glayout


Name :- Chetanya Goyal
Slack :- Chetanya Goyal
Email :- chinnidude@gmail.com
GitHub :- chetanyagoyal

Project Overview

This project aims to develop a system to generate DRC and LVS-clean analog layouts from natural language prompts, which will streamline layout creation and enable high-level circuit specification. This involves the use of the pre-existing GLayout API in conjunction with a specially created syntax (termed as strict-syntax) and a Large Language Model.
This will allow for the conversion of natural language prompts such as -
create a two-stage miller compensated operational amplifier with an nmos output stage.

to a DRC and LVS clean layout object which can be written out to the GDS-II stream format. This would reduce the barrier-to-entry of code-assisted analog-layout as it completely abstracts out the underlying python API for the end user, and creates a layout in a far smaller amount of time, as compared to having to write out the python code from scratch, or worse yet, creating the layout with conventional tools such as cadence or magic.
Full Proposal

You can find a detailed description of the project here
Deliverables


Implement DRC checks for Glayout using magic DRC in OpenFASoC’s CI system
Refine relational framework with bug fixes and detailed error logging
Implement more complex Pcells and placement methods (enumerated here)
Improve the Relational Database:

Abstract out internal port names to simplify routing
Allow for multiple commands (separated by the word “and”) in a single prompt
Create lookup .json files for generic circuit classes


Implement automatic netlist generation to enable LVS checks for user created circuits
Deploy the framework and crowdsource circuit designs to accelerate the generation of a training dataset
Develop the LLM to simplify prompts into sentences that the relational model can parse and dump code for

Work Completed

CI Checks for the Glayout API

The most immediate action after the commencement of the GSoC coding period was the implementation of DRC and LVS checks in the GitHub Actions, and as decided upon deliberation with my mentor, spice testbench simulations.
The MOSFET, Differential Pair, Current Mirror and Operational Amplifier Pcells were chosen to be tested as together, they encompass the cumulative functionality of Glayout and all possible pcells and macros are included.
DRC is run using magic, similar to how it was previously done for the Digital Generators already present in the OpenFASoC directories, with modifications being made such that DRC can be run with a single python command in the form of a callable function in the MappedPDK class. LVS was implemented in a similar way, where netgen is invoked to run compare the extracted (postPEX or prePEX) and the standard netlist.
To test the NGSpice and numpy's version compatibilities with the code, a spice testbench was written to calculate the OpAmp's (with default python parameters) best possible gain, Unity Gain Bandwidth, -3dB-Bandwidth and a few other golden metrics. 100 NGSpice runs of the OpAmp were executed to run parallely, and the mean results of these runs were stored to the .github directory to be used as a comparison for CI runs.
YAML workflow files were written for these tests and the code was committed to the OpenFASoC diretory.
Improving the Large Language Model

The large language models of choice were two mistral models (7B and 22B) and microsoft's Phi-3b.
The code was written such that the user can pick whichever model they want (based on their compute constraints) with a simple command line argument. The code enables lora and qlora with a default of 8-bit quauntization.
Training data takes the form of strict syntax files and a corresponding natural language prompt. For example:
{
    "data": [
        {
            "NLPfilename": "PTypeDiffPair.convo",
            "LLMprompt": "Make a p-type differential pair. Parametrize everything."
        }
        ...
    ]
}
where PTypeDiffPair.convo is the strict syntax file which contains the following code:
PTypeDiffPair 
// no imports
// create parameters: vin1_width, vin2_width, vin1_length, vin2_length, vin1_multiplier, vin2_multiplier, vin1_fingers, vin2_fingers
create a float parameter called vin1_width
create a float parameter called vin2_width
create a float parameter called vin1_length
create a float parameter called vin2_length
create a int parameter called vin1_multiplier
create a int parameter called vin2_multiplier
create a int parameter called vin1_fingers
create a int parameter called vin2_fingers
// place
place a pmos called vin1 with width=vin1_width, length=vin1_length, fingers=vin1_fingers, rmult=1, multipliers=vin1_multiplier, with_substrate_tap=False, with_tie=True, with_dummy=True
place a pmos called vin2 with width=vin2_width, length=vin2_length, fingers=vin2_fingers, rmult=1, multipliers=vin2_multiplier, with_substrate_tap=False, with_tie=True, with_dummy=True
// more than one component has been placed, so move
move vin1 to the left of vin2
// differential pair only has one route which is source to source
route between vin1_source_E and vin2_source_W using smart_route

Important
Convos are structured so as to distill the python code in natural language commands which will be far easier for an LLM to interpret and generate. A few basic functions have been created as a part of strict-syntax:
Details on the Strict-Syntax API can be found in this document. It also contains a detailed change-log and information about bug-fixes (all done in the coding period).
Such convo-prompt pairs are stored in .json files (split into eval pairs and training pairs).

Training is done using a batch size of 1, and a maximum of 2 epochs, after it was noticed that the model starts overfitting after these many epochs. The SFT trainer was used in the following configuration:
trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        args=training_args,
        train_dataset=data["train"],
        eval_dataset=data["evaluation"],
        max_seq_length=4096,
        data_collator=data_collator
    )
The code can be found here.
As future work, the LLM can be further improved by incorporating this RL optimizer in the training flow. This will allow for the LLM to minimize parasitics, whilst still creating a DRC and LVS clean design.

the RL optimizer works by, for a given circuit (the PEX netlist extracted from the layout), sweeping over a set of parameters and calculating a pre-decided set of metrics, after which it reports the best configuration of the design.

The linked paper elaborates on the APIs used and the results in detail (was written during the coding period, and a significant amount of time and effort went towards writing the paper and generating results).
Glayout Refactor

Glayout uses GDSfactory as an underlying API, which recently switched backends from gdstk to klayout. Glayout has been using GDSfactory's version 7.7.0 version, and it was deliberated that a refactor is necessary, due to the significant changes to the GDSfactory codebase. Therefore, the project's track was realigned to account for this, and it was decided that Glayout will be refactored such that it would be compatible with at least GDSfactory version 8.0.0.
This work is still ongoing, as it requires sweeping changes and in some places, a complete rewrite of pcells. The reason this task was undertaken was to be keep with the times and to update the now slightly outdated codebase.
This work is projected to be completed within two weeks after the report submission, and the progress can be tracked through this repository's gdsfactory8 branch.
Code Contributions


CI Checks for Glayout
fix: pip install with git+ fails
feat: Glayout test script
feat: mini refactor

Fixing LVS for failing cells
Adding Current Mirror to CI checks
Switch workflows to node20
OpAmp Spice Testbench


fix: README fixes for Glayout
feat: Folder Restructure, other contribs

Add glayout CI test python script
Add PCell checking workflows (DRC and LVS for fets, diffpair, opamp, current mirror)
Add testbench simulation in CI for OpAmp


feat: Update glayout test script to allow argument passing
fix: Code import issues in relational framework
feat: Update python version to 3.10 in docker build and push
feat: Add Notebook links, and other info. to Glayout README
feat: Testbenches

Add Differential Pair Testbench
Add Current Mirror Testbench
Update OpAmp Testbench


feat: Another Folder Restructure, separate larger pcells into different heirarchies
fix: update nltk dependencies for LLM
fix: GLayout CI fixes
fix: pin numpy version to 1.23.5 (>2 not compatible with gdsfactory 7.7.0)
No results found