lionel-/nse-annotations.md

## nse-annotations.md

      
    Raw
  

              nse-annotations.md
            
          
    declare() Syntax Proposals

This proposal introduces declare() annotations for helping static analysis tools understand NSE functions. There are two main categories:


Declarations for callers of NSE functions. These are mainly meant as escape hatches that allow users to silence spurious diagnostics, when NSE functions are missing annotations.


Declarations for authors of NSE functions. These annotations declare the evaluation behaviour of one or more parameters.


Rationale

The main challenge for static analysis of R code (e.g. diagnostics for unknown and unused variables) is that any function can potentially use NSE. Optional annotations for authors of NSE functions distributed in packages would allow development and checking tools to understand the modalities of evaluation of NSE arguments. When this fails (e.g. because annotations are missing), optional annotations for users of NSE functions could provide missing information to the static analysis tools.
Call-site annotations: Declaring local variables

This syntax is used in a given context (e.g. a namespace or a function body) to inform analysis tools that certain variables, which are not visibly defined, should be considered to exist within that scope.
Intent: Suppresses unknown variable warnings for foo and bar across the whole package.
# At top-level
declare(variables(foo, bar))

# Equivalent to
globalVariables(c("foo", "bar"))
Intent: Suppresses unknown variable warnings for magrittr's special variable . just in that function.
fn <- function() {
  declare(variables(cyl, mpg))

  # No warning
  with(mtcars, cyl + mpg)
}

# Warning
with(mtcars, cyl + mpg)
Definition-site annotations: Declaring function parameters

These annotations are used inside in the first line of an NSE function's body to describe the evaluation semantics of its arguments. The declarations are placed within a params() list.
Pure quoting function

Intent: Suppresses all analysis in the quoted argument:

Turns off diagnostics
Turns off emission of effects (e.g. <- assignment)

quote <- function(x) {
  declare(params(
    x = quoted
  ))
}
Declaring special variables

We could reuse the variables() syntax to declare special variables that are injected in an evaluation environment, such as tidyverse's anaphoric pronouns:
`%>%` <- function(lhs, rhs) {
  declare(params(
    rhs = with(variables(.))
  ))
}

# Doesn't warn about undefined `.`
mtcars %>% head(.)
mutate <- function(.data, ...) {
  declare(params(
    ... = with(variables(.data, .env))
  ))
}

# Doesn't warn about undefined `.data` or `.env`
factor <- 100
data |>
  dplyr::mutate(
    var = .data$var * .env$factor
  )
The with() operator means the argument are evaluated in a child of the calling environment, with the supplied variables injected.
Evaluation in data environments

If we introduce the .() operator to mean "whatever can be statically inferred from a piece of R code", we can annotate functions that evaluate in a data frame:
subset <- function(x, ...) {
  # Evaluate `...` in caller scope with `x` attached
  declare(params(
    ... = with(.(x))
  ))
}
To do anything useful, this requires the analysis tool to have knowledge about the input data, either via type annotations or type guards (such as stopifnot(all(nms %in% data))).
Evaluation in local environments

with() is a convenient annotation for a large subset of NSE functions whose semantics match those of the base with() function (evaluation in a data frame whose environment inherits from the current lexical scope). However we need a more general operator for more specialised operations. For this purpose we introduce a new operator eval(). Unlike with() which indicates evaluation takes place in a child of the calling environment, eval() signals evaluation directly in the environment passed as argument.
An eval() annotation is useful with functions like local().
local <- function(expr, envir = new.env(envir = parent.frame())) {
  declare(params(
    expr = eval(.(envir))
  ))
}
One tricky aspect for this case is that the evaluation environment depends on an argument.


The dependence on the runtime value of the argument is expressed via .().


R functions like new.env() and parent.frame() are treated by the analysis tool as scope selectors.


With these two notions, the evaluation environment can be statically inferred from the default argument new.env(envir = parent.frame()).
In case a non-static environment is passed as envir, a static analyser can't tell anything about the environment anymore. The caller then needs to use caller-site annotations to help reason with the code.
A function like test_that() has the same semantics as local() but doesn't have any argument to determine the scope. In that case, pass the scope selectors directly:
test_that <- function(title, expr) {
  declare(params(
    expr = eval(.(new.env(parent = parent.frame())))
  ))
}
Related annotations

Declaring unused variables

For the purpose of diagnostics, it's likely fine for a static analyser to consider all functions to have strict and early evaluation of arguments, unless they are quoted.
Any occurrences of a variable in an argument would be treated as a "use". The following would be sufficient to avoid an "unused variable" warning:
x <- 1
some_function(x)
What if the function doesn't actually evaluate the argument? This would be linted within that function:
# WARNING: `x` is unused
ignore <- function(x) {
  NULL
}
If it is intentional that the variable is unused, the author can silence the diagnostic by explicitly declaring the parameter unused:
ignore <- function(x) {
  declare(params(
    x = unused
  ))

  NULL
}
Then a warning would be emited at call sites:
# WARNING: `x` is unused
x <- 1
some_function(x)

# WARNING: `x` assignment not evaluated
some_function(x <- 1)
Back to the function definition. What if a parameter is only used along one path?
# WARNING: `x` may be unused
maybe_print <- function(x) {
  if (sample(c(0, 1), size = 1)) {
    print(x)
  }
}
Fix it with:
maybe_print <- function(x) {
  force(x)
  if (sample(c(0, 1), size = 1)) {
    print(x)
  }
}
Basic evaluation semantics


A forced declaration might be a way to turn off lazy evaluation.
identity <- function(x) {
  declare(params(
    x = forced
  ))
}


If R ever provides a way to configure a file or package to turn off lazy evaluation by default (e.g. a top-level declare() would configure the parser to mark parsed functions with a forced flag), the delayed annotation could be useful to turn it back on and allow
try <- function(x) {
  declare(params(
    x = delayed
  ))
  tryCatch(x, error = identity)
}


Note that both of these would be a departure from the effect-free annotations discussed in this proposal. The presence of a forced annotation would cause compiled and interpreted code to evaluate arguments early. Whether that is desirable is an interesting topic of discussion.
No results found