All Superinterfaces:: AutoCloseable, Closeable, Operator, OperatorPipelineV3, Serializable, Transformer, org.apache.spark.sql.api.java.UDF1<org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>,org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>>

@DeveloperApi @FeatureAfterCall public interface SupportsGroupWithinPartitions extends Transformer

A mix-in interface for Transformer. Dataset transformer can implement this interface to support additional grouping within each partition for pairing-aware processing of multiple datasets. This is particularly useful when the dataframe hash partitioning strategy is not able to provide the desired granular data partitions, i.e. data records belonging to different partitions would be grouped into the same partition because of hashing to the same hash value. Especially when processing across multiple datasets (e.g. tumor-normal somatic analysis), SeqsLab uses the grouping expressions to repartition records in each dataframe partition after unioning corresponding partitions across multiple datasets and properly pairing them for localization.

Method Summary

Modifier and Type

Method

Description

org.apache.spark.sql.Column[]

getGroupExprs(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df)

Get the grouping expressions as a list of Column, ex: df.col("group_id").

Methods inherited from interface java.io.Closeable
close

Methods inherited from interface com.atgenomix.seqslab.piper.plugin.api.Operator
getName, getOperatorContext

Methods inherited from interface com.atgenomix.seqslab.piper.plugin.api.transformer.Transformer
init, numPartitions

Methods inherited from interface org.apache.spark.sql.api.java.UDF1
call

Method Details
- getGroupExprs
  
  org.apache.spark.sql.Column[] getGroupExprs(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df)
  
  Get the grouping expressions as a list of Column, ex: df.col("group_id").
  
  Returns:
  
  Array of DataFrame Columns

Interface SupportsGroupWithinPartitions

Method Summary

Methods inherited from interface java.io.Closeable

Methods inherited from interface com.atgenomix.seqslab.piper.plugin.api.Operator

Methods inherited from interface com.atgenomix.seqslab.piper.plugin.api.transformer.Transformer

Methods inherited from interface org.apache.spark.sql.api.java.UDF1

Method Details

getGroupExprs