Interface Operator

All Superinterfaces:
AutoCloseable, Closeable, OperatorPipelineV3, Serializable
All Known Subinterfaces:
Collector, Executor, Formatter, Loader, SupportsAggregation, SupportsCopyToLocal, SupportsFileLocalization, SupportsHadoopDFS, SupportsOrdering, SupportsReadPartitions, SupportsRepartitioning, SupportsSaveToBLOB, SupportsSaveToHTTP, SupportsSaveToJDBC, SupportsScanPartitions, SupportsTableLocalization, SupportsUDF1<T1,R>, SupportsUDF2<T1,T2,R>, SupportsUDF3<T1,T2,T3,R>, SupportsUDF4<T1,T2,T3,T4,R>, SupportsUDF5<T1,T2,T3,T4,T5,R>, Transformer, Writer

@DeveloperApi public interface Operator extends OperatorPipelineV3, Serializable, Closeable
An operator that processes Spark DataFrame and produces a new DataFrame. Multiple operators chained together is the operator pipeline that automates in-memory data processing specific to an input or output file or tabular data in a workflow task. Operations available in operator pipeline are divided into localization, computation, and delocalization.
Localization loads datasets from a source (such as blob storage) and optionally transforms the dataset to meet the requirements of distributed task commands.
Computation passes DataFrame partitions to task command as inputs and executes the task command (such as shell script or SQL).
Delocalization collects a file or dataset outputted from task command and saves to a destination (such as blob storage).
See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    default String
    Get the operator name that is used to uniquely specify the operator configuration of task operator pipelines.
    Get the OperatorContext containing a list of properties in the form of NamedValue objects associated with this operator object.

    Methods inherited from interface java.io.Closeable

    close
  • Method Details

    • getName

      default String getName()
      Get the operator name that is used to uniquely specify the operator configuration of task operator pipelines.
      Returns:
      operator name
    • getOperatorContext

      OperatorContext getOperatorContext()
      Get the OperatorContext containing a list of properties in the form of NamedValue objects associated with this operator object. Operator context can store state properties that will be passed to downstream operators.
      Returns:
      operator context