Suppose you need to apply the same function to multiple columns in one DataFrame, one straight way is like this:
val newDF = oldDF.withColumn("colA", func("colA")).withColumn("colB", func("colB")).withColumn("colC", func("colC"))If you want to save some type, you can try this:
- Use
selectwith varargs including*:
import spark.implicits._
df.select($"*" +: Seq("A", "B", "C").map( c => func(c) ): _*)Here:
- Maps column names to
funcwithSeq("A", ...).map(...) - Prepends all pre-existing columns with
$"*" +: ... - Unpacks combined sequences with
... : _*
and can be generalized as:
import org.apache.spark.sql.{Column, DataFrame}
/**
* @param cols a sequence of columns to transform
* @param df an input DataFrame
* @param f a function to be applied on each col in cols
*/
def withColumns(cols: Seq[String], df: DataFrame, f: String => Column) =
df.select($"*" +: cols.map(c => f(c)): _*)Note: If you want to change the result column name, you can use column.as/alias(...); but generally you can not replace the original column (not like withColumn).
- With
withColumnyou can usefoldLeft:
Seq("A","B","C").foldLeft(df)( (df, c) => df.withColumn( c, func(c) ) )which can be generalized to :
/**
* @param cols a sequence of columns to transform
* @param df an input DataFrame
* @param f a function to be applied on each col in cols
* @param name a function mapping from input to output name.
*/
def withColumns(cols: Seq[String], df: DataFrame,
f: String => Column, name: String => String = identity) =
cols.foldLeft(df)((df, c) => df.withColumn(name(c), f(c)))Note here you can replace the original columns.
One example of func:
import org.apache.spark.sql._
def datefmt(c: String): Column = from_unixtime(col(c) / 1000, "yyyy-MM-dd'T'HH:mm:ss.SSSXXX")Another example:
// casting of all columns with idiomatic approach in scala
def castAllTypedColumnsTo(df: DataFrame, sourceType: DataType, targetType: DataType) = {
df.schema.filter(_.dataType == sourceType).foldLeft(df) {
case (acc, col) => acc.withColumn(col.name, df(col.name).cast(targetType))
}
}References:
Thanks a lot! That's just what I needed.