Skip to content

Instantly share code, notes, and snippets.

View thisisnic's full-sized avatar

Nic Crane thisisnic

View GitHub Profile
@thisisnic
thisisnic / ellmer_structured_output2.R
Created November 18, 2025 09:58
Structured output and enums
library(ellmer)
kendrick_wiki_text <- "Kendrick Lamar Duckworth (born June 17, 1987) is an American rapper, singer, songwriter, and record producer. Regarded as one of the greatest rappers of all time, he was awarded the 2018 Pulitzer Prize for Music, becoming the first musician outside of the classical and jazz genres to receive the award. Lamar's music, rooted in West Coast hip-hop, features conscious, introspective lyrics, with political criticism and social commentary concerning African-American culture."
hip_hop_theme <- type_enum(c("conscious_rap", "party_anthems", "storytelling", "braggadocio", "love_relationships"))
artist_type <- type_object(
name = type_string("Artist's full name"),
birth_date = type_string("Birth date in YYYY-MM-DD format"),
genre = type_string("Primary music genre"),
@thisisnic
thisisnic / ellmer_structured_output.R
Last active November 11, 2025 18:24
Structured Output with ellmer
library(ellmer)
kendrick_wiki_text <- "Kendrick Lamar Duckworth (born June 17, 1987) is an American rapper, singer, songwriter, and record producer. Regarded as one of the greatest rappers of all time, he was awarded the 2018 Pulitzer Prize for Music, becoming the first musician outside of the classical and jazz genres to receive the award. Lamar's music, rooted in West Coast hip-hop, features conscious, introspective lyrics, with political criticism and social commentary concerning African-American culture."
artist_type <- type_object(
name = type_string("Artist's full name"),
birth_date = type_string("Birth date in YYYY-MM-DD format"),
genre = type_string("Primary music genre"),
themes = type_array(type_string("Musical themes"))
)
@thisisnic
thisisnic / interrupt.R
Created October 30, 2024 14:20
function for interrupting an R function execution if takes too long
foo <- function() {
time_limit <- 3
setTimeLimit(cpu = time_limit, elapsed = time_limit, transient = TRUE)
on.exit({
setTimeLimit(cpu = Inf, elapsed = Inf, transient = FALSE)
})
tryCatch({
@thisisnic
thisisnic / print_tz.R
Created September 12, 2023 07:32
Detect legacy timezone symlinks in code (2023c-8 not 2023c-10)
# Detect invalid timezones
all_names <- tzdb::tzdb_names()
bad_names <- c(
all_names[startsWith(all_names, "US/")],
all_names[!stringr::str_detect(all_names, "/")]
)
all_names[map_lgl(all_names, ~!.x %in% bad_names)]
@thisisnic
thisisnic / stream_to_feather_in_r.md
Created July 28, 2023 07:53
Stream to Arrow/Feather in R
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
@thisisnic
thisisnic / stream_to_parquet_in_r.md
Created July 27, 2023 21:55
Reprex showing how to stream data into a Parquet file in R
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
@thisisnic
thisisnic / blanks.R
Last active February 9, 2023 10:17
Example for user of how a combination of blank values may result in error reading a CSV
``` r
library(arrow)
library(dplyr)
library(stringr)
tf <- tempfile()
# values to save - note the space after the final new line
dodgy_vals <- "x,y\n0,1\n ,4"
cat(dodgy_vals)
@thisisnic
thisisnic / gist:14fb9c1001261f2cf249f9317cda6466
Last active September 8, 2022 15:14
lazy_query from dbplyr
# query details copied from https://github.com/voltrondata-labs/arrowbench/blob/main/R/tpch-queries.R
query_results <- lineitem_db %>%
select(l_shipdate, l_returnflag, l_linestatus, l_quantity,
l_extendedprice, l_discount, l_tax) %>%
# kludge, should be: filter(l_shipdate <= "1998-12-01" - interval x day) %>%
# where x is between 60 and 120, 90 is the only one that will validate.
filter(l_shipdate <= as.Date("1998-09-02")) %>%
select(l_returnflag, l_linestatus, l_quantity, l_extendedprice, l_discount, l_tax) %>%
group_by(l_returnflag, l_linestatus) %>%
summarise(
---
title: "Apache Arrow R Questions on Stack Overflow"
format: html
---
```{r}
#| label: load-packages-and-code
#| include: false
library(httr)
library(dplyr)
@thisisnic
thisisnic / pre-commit
Created August 26, 2021 17:03
pre-commit file which runs styler on everything
#!/bin/bash
set -e
SOURCE_DIR='<path_to_project_root_goes_here>'
# Find all .R files which have been staged via git add
FILES_TO_STYLE=$(git diff --name-only --staged | grep "\.R")
for FILE in ${FILES_TO_STYLE[@]}
do