Interface Transformer
- All Superinterfaces:
AutoCloseable
,Closeable
,Operator
,OperatorPipelineV3
,Serializable
,org.apache.spark.sql.api.java.UDF1<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>,
org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
- All Known Subinterfaces:
SupportsGroupWithinPartitions
,SupportsOrdering
@DeveloperApi
public interface Transformer
extends Operator, org.apache.spark.sql.api.java.UDF1<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>
The operator responsible for repartitioning, and additionally sorting, DataFrames loaded by
Loader
to optimize downstream data processing.
For instance, in genome sequencing analysis, a transformer can repartition BAM or VCF datasets based on
non-overlapping target regions.
A Transformer is a user-defined function (UDF) that takes a DataFrame as an input parameter
and returns a partitioned and optionally sorted DataFrame.
Transformer objects are created first by invoking Transformer operator factory (implements
TransformerSupport
) when pipeline task requests it and will be lazily initialized when it is ready to run.
When completed, the close method will be invoked to release resources.
SeqsLab supports multiple data processing features to manage and optimize workloads.
A Transformer can inform SeqsLab its supporting features by implementing the specific mix-in interfaces.-
Method Summary
Modifier and TypeMethodDescriptioninit
(int cpuCores, int memPerCore) Initializes this operator.int
Get the number of partitions after repartition.Methods inherited from interface com.atgenomix.seqslab.piper.plugin.api.Operator
getName, getOperatorContext
Methods inherited from interface org.apache.spark.sql.api.java.UDF1
call
-
Method Details
-
init
Initializes this operator.- Parameters:
cpuCores
- Total number of CPU cores in current computing clustermemPerCore
- Allocated memory per CPU core in GB- Returns:
- The object itself
-
numPartitions
int numPartitions()Get the number of partitions after repartition.- Returns:
- Number of partitions
-