DeflateAwning/polars_upgrade_regex_magic.md

## polars_upgrade_regex_magic.md

      
    Raw
  

              polars_upgrade_regex_magic.md
            
          
    Sometimes the Python Polars library upgrades.
The following are some codebase migrations I've used.
Pre-v0.20.x


Find+Replace: pl.Utf8 -> pl.String

v0.20.31 to v1.x


Find+Replace how='outer_coalesce' on joins

Find:
how\s*=\s*['"]outer_coalesce['"]\s*,

Replace:
how='full', coalesce=True,


Locate .replace(..., return_dtype=something)

rg -U -t py '\.replace\([^)]+return_dtype'

Non-breaking change with infer_schema_length (from 0 to False to read all as String):

Find: infer_schema_length\s*=\s*0
Replace: infer_schema_length=False


Find pl.read_csv, and pin engine='xlsx2csv' arg (or migrate manually).
Find and Replace all .frame_equal( to .equals(.
Note that df.drop(['column_that_does_not_exist']) now raises an exception.

v1.12 to v1.22

Not sure which version each of these broke/are required, but:

\w+\.fill_null\((pl\.lit\()?0 -> df.with_columns(pl.selectors.numeric(pl.lit(0))


Note: Requires regex rework.a

v1.22 to v1.31


Find and replace join_nulls= to nulls_equal=
Find and replace .collect(streaming=True to .collect(engine="streaming"
Selectively find and replace .str.concat() to .str.join(delimiter="|")
Fix lf.melt
Find and replace .str.concat( to .str.join(
Find \.is_in.+(unique|[a-zA-Z]+\[). Add .implode() to the argument of .is_in(arg), if arg is pl.Series or pl.Expr.

v1.31 to v1.32


Any time .map_elements is used inside a group_by's .agg(...), you must now implode. Issue

# ripgrep command to very-generously find potential instances.
rg --multiline --multiline-dotall --glob '*.py' '\.group_by\s*\([^)]*\)[\s\S]*?\.agg\s*\([\s\S]{0,200}\.map_elements\s*\(' . -l
No results found