书名：Mastering Hadoop 3
作者名：Chanchal Singh Manish Kumar
本章字数：186字
更新时间：2025-04-04 14:54:50

Composite join

The map side join on a very large dataset is known as a composite join. The advantage will be the same as we discussed in map side join previously in that the shuffling and sorting phase will be skipped as there will be no reducer. The only condition for composite join is that data needs to be prepared with a specific condition before it gets processed.

One of the conditions is that the dataset must be sorted with the key that was used for the join. It must also partition by the key and both datasets must have the same number of partitions. Hadoop provides a special InputFormat to read such datasets with CompositeInputFormat.

Before using the following template, you must process your input data to sort and partition to make the data be in the format that's required for composite join. The first step should be to prepare the input data and we must preprocess input data to sort and partition it using a join key. Let's look into mapper and reducer to sort and partition the input data.