All Superinterfaces:
AutoCloseable, Closeable, Operator, OperatorPipelineV3, Serializable
All Known Subinterfaces:
SupportsUDF1<T1,R>, SupportsUDF2<T1,T2,R>, SupportsUDF3<T1,T2,T3,R>, SupportsUDF4<T1,T2,T3,T4,R>, SupportsUDF5<T1,T2,T3,T4,T5,R>

@DeveloperApi public interface Formatter extends Operator
An operator responsible for formatting input datasets, such as converting schema, adding or deleting columns, and encoding domain specific object. A Formatter is a scalar user-defined function (UDF) that acts on one DataFrame row and takes various number of columns as input parameters and returns a new or existing formatted column value. Formatter objects are created first by invoking Formatter operator factory (implements FormatterSupport) when pipeline task requests it and will be lazily initialized when it is ready to run. When completed, the close method will be invoked to release resources. SeqsLab supports multiple data processing features to manage and optimize workloads. A Formatter can inform SeqsLab its supporting features by implementing the specific mix-in interfaces.
See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    Initializes this formatter operator.
    Returns a set of selected column names as an array.
    Map.Entry<String,org.apache.spark.sql.types.DataType>
    Returns a pair of output column name and its data type for create (new column name) or update (existing column name) after calling this Formatter user-defined function.

    Methods inherited from interface java.io.Closeable

    close

    Methods inherited from interface com.atgenomix.seqslab.piper.plugin.api.Operator

    getName, getOperatorContext
  • Method Details

    • init

      Formatter init()
      Initializes this formatter operator.
      Returns:
      The object itself
    • select

      String[] select()
      Returns a set of selected column names as an array. When calling user-defined formatter function, workflow engine will pass the column values in exact order as the order of the selected column names.
    • withColumn

      Map.Entry<String,org.apache.spark.sql.types.DataType> withColumn()
      Returns a pair of output column name and its data type for create (new column name) or update (existing column name) after calling this Formatter user-defined function. The return type of Formatter UDF must match the data type return by this method.