The parallel implementation relies on a strategy for directed virtual reduction, namely half combustion, which we introduce in this paper. We embed in the implementation both a message aggregation technique, which allows a reduction of the communication overhead, and a fair policy for distributing dynamically originated load amongst processors. The aggregation technique is mandatory as the granularity of the computation is fine. Through this technique we obtain a linear speedup close to 80% of the ideal one on a shared memory multiprocessor. This result points out the viability of parallel implementations for optimal reduction.