Implement a -fdiagnostics=sarif flag for the clang frontend, which
instructs clang binary to emit diagnostics formatted using the SARIF
specification.
There are several subtasks for this:
- - [X]
FLAGExtend-fdiagnostics=to acceptsarifas a value, ensure that the effect of flag is a no-op. - - [ ]
BRIDGECreate aDiagnosticConsumerthat is capable of accepting diagnostics from clang and storing them for usage later. There should be a method provided to flush stored diagnostics out into some kind of writer - - [X]
EMITTERCreate a SARIF emitter based on code present inlibStaticAnalyzer - - [ ]
END_TO_ENDLink the SARIF Consumer to the frontend and write tests for the outpu
clangStaticAnalyzer already implements a SARIF consumer, but the implementation there appears to be targeted towards “path-sensitive diagnostics”. What are these?
Command line options are generated via LLVM-Tablegen
Of interest are clang’s options; which come from:
clang/include/Options.td.
This file is used to generate an Options.inc file in-situ that is used by
clang/include/Options.h
to define accepted CLI opts.
This results in an -Wswitch-enum warning due to the unused TextDiagnostic::SARIF enum, which will be resolved at a later stage.
The objective here is to create an internal client, that can be hooked to the diagnostics engine. But there are two possibilities for what this should derive from:
DiagnosticRenderer: This seems to be a utility class focused purely on formatting diagnostics, what is confusing is that when theclangdriver is initialized it comes with a renderer, and not a consumer?DiagnosticConsumer: This, according to the Internals Manual is what is responsible for handling diagnostics passed in by the current compiler instance
The main reason with going for creating a DiagnosticConsumer, is the following:
- The Diagnostic Internals Manual says so (rather unlucky to have found this towards the end my search)
-
CompilerInstance::createDiagnosticsaccepts a(DiagnosticConsumer *)parameter.
A source of confusion while figuring the above out, was the Text{DiagnosticPrinter,DiagnosticBuffer,Diagnostic} family of classes.
After some digging it appears they work as follows:
TextDiagnosticPrinteris aDiagnosticConsumer:TextDiagnosticPrinterinternally uses aTextDiagnosticobject to perform all the special formatting for each of the options (-fdiagnostics-format=[clang|vi|msvc])TextDiagnosticBufferis a class used only to buffer the diagnostic text without any special pretty printing. This is useful for-verifyvia theVerifyDiagnosticConsumer, also useful for tests. More context is provided in (libflangFrontend) https://reviews.llvm.org/D87774
- [X] Create
clang/include/SarifDiagnosticConsumer.handclang/include/SarifDiagnosticConsumer.cpp - [ ] Create unit tests (similar to
VerifyDiagnosticConsumer) that check if diagnostics passed to this consumer are stored properly - [X] Modify
CompilerInstance::createDiagnosticsso that it sets the consumer when-fdiagnostics-format=sarifis specified - [X] Ensure
TextDiagnosticPrintercannot use-fdiagnostics-format=sarif
The code in clangStaticAnalyzer is useful for SarifDiagnosticBuffer and needs to be refactored into it’s own templated class. The templating is necessary since the prior implementation’s createRule, createRules functions load rule names from a .inc file. A class and a template specification for PathDiagnostic is necessary to cover this, while keeping a common interface between StaticAnalyzer and the compiler internals.
- [X] Refactor all the
staticfunctions for creating SARIF fragments into a templated class / method, and create a specialization forPathDiagnostic
One of the hangups of ITERATION-0 turned out to be there were too many moving parts that were coupled together through the notion of “diagnostic”. The original approach was to make template specializations for each of the diagnostic types. It is not straightfoward to have a clean, common implementation this way. What turns out to be a better approach is to separate the concerns by action:
- Writing SARIF documents through a common interface `SarifDocumentWriter`
- Consuming diagnostics and translating them into SARIF
Split up the SarifDocumentWriter -> SarifLog, SarifLogSerializer
Create: SarifLog::validate() to recursively validate the SARIF to be seriailzed.
Need a good testing tool for this. Possibly use pydantic + datamodel_codegen to create an object model from the schema, then use this to validate SARIF produced by test tools? This still doesn’t solve some of the other problems, such as Microsoft’s SARIF validator (https://sarifweb.azurewebsites.net/Validation) has some extra rules such as:
versionvalues must be numeric (The spec says it can be anything the tool natively outputs, so it deviates here, other cases will come as we build larger tests.)
- [ ] Improve documentation for
TextDiagnostic,TextDiagnosticPrinter, andTextDiagnosticBuffer - [ ] Investigate why
-fdiagnostics-format=<value>doesn’t show up inclang --help