The key

The key is to collect / reduce

Stream in Java make possible functional-style operations on (collections of) elements, this makes the developer forget the good ol’ for construct very quickly.

Sometimes the lazy developer, plagiarized by the stream zen laziness, focuses heavily on intermediate operations like map or filter, forgetting the key: collectors!

Standard Collectors

The Java standard library provides many useful default collectors and simple API to build them, they can be found through the java.util.stream.Collectors class:

// Compute sum of salaries of employee
int total = employees.stream()
        .collect(Collectors.summingInt(Employee::getSalary)));

// Group employees by department
Map<Department, List<Employee>> byDept
    = employees.stream()
        .collect(Collectors.groupingBy(Employee::getDepartment));

Specialized Streams

Sometimes forgotten things hide the best treasures

If you don’t have lived under a rock you know that Java have some specialized stream types: IntStream, LongStream, and DoubleStream that are streams over primitive int, long and double types.

Specialized streams can be used to obtain useful data from the stream, so if you need to calculate the maximum value of a stream of ints you can use the OptionalInt max() method of IntStream! Someone can object that the same result can be obtained with a reduce operation like reduce(Integer::max) but what if you want to calculate an average? You could to it using a custom and cumbersome collector, or go straight to the OptionalDouble average() method!

But wait, reading the IntStream JavaDoc, lurking around a corner, a precious gem appears!

summaryStatistics()

This method returns an IntSummaryStatistics that collects statistics such as count, min, max, sum, and average!

Obtaining custom statistics from a Stream

Let’s have a POJO representing a person and build a list of them:

class Person {
    int weight;
    int height;
    int getHeight() {}
    int getWeight() {}
}

// Some random people
var list = List.of(new Person(67, 178),
                   new Person(86, 186),
                   new Person(73, 173));

In the good old days (pre Java 8) to find the group’s maximum height and average weight we would have written something like this:

int count = 0;
int maxHeight = Integer.MIN_VALUE;
int sumWeight = 0;
for (Person p : list) {
  count++;
  maxHeight = Math.max(maxHeight, p.getHeight());
  sumWeight += p.getWeight();
}
double avg = count > 0 ? (double) sumWeight / count : 0.0d;

But we live in a bright present and we have streams!

var averageWeight = list.stream()
    .mapToInt(Person::getWeight).average();
var averageHeight = list.stream()
    .mapToInt(Person::getHeight).average();
averageWeight.ifPresent(aWeigh ->
    System.out.printf("Averages weight %.2f%n", aWeigh));
averageHeight.ifPresent(aHeigh ->
    System.out.printf("Averages height %.2f%n", aHeigh));

// Averages weight 75,33
// Averages height 179,00

This snipped is better (and more readable) but still we are consuming two streams to obtain two results! This is unacceptable.

A first improvement may be using the aforementioned IntSummaryStatistics to obtain a bunch of statistics from each person’s measure.

IntSummaryStatistics statsWeight =
    list.stream()
        .mapToInt(Person::getWeight)
        .summaryStatistics();
System.out.printf("Weight stats %s%n", statsWeight);
IntSummaryStatistics statsHeight =
    list.stream()
        .mapToInt(Person::getHeight)
        .summaryStatistics();
System.out.printf("Height stats %s%n", statsHeight);

// Weight stats IntSummaryStatistics{count=3, sum=226, min=67, average=75,333333, max=86}
// Height stats IntSummaryStatistics{count=3, sum=537, min=173, average=179,000000, max=186}

Simple and straight result, but still not very efficient: the stream is rebuilt and consumed twice.

Learn from the JDK

Let’s look at the IntSummaryStatistics JavaDoc:

IntSummaryStatistics stats =
    intStream.collect(IntSummaryStatistics::new,
                      IntSummaryStatistics::accept,
                      IntSummaryStatistics::combine);

In the docs the author is building a custom collector, shall we do the same? Yes!

Anatomy of a Collector

A Collector is specified by four functions that work together to accumulate entries into a mutable result container, and optionally perform a final transform on the result.

They are:

  • creation of a new result container: supplier()
  • incorporating a new data element into a result container: accumulator()
  • combining two result containers into one: combiner()
  • performing an optional final transform on the container: finisher()

Let’s build:

class PersonStats {
  // Mutable content
  int count = 0;
  int maxHeight = Integer.MIN_VALUE;
  int sumWeight = 0; // Used to calculate average
  // Accumulator
  public void accept(Person p) {
    count++;
    maxHeight = Math.max(maxHeight, p.getHeight());
    sumWeight += p.getWeight();
  }
  // Combiner
  public PersonStats combine(PersonStats other) {
    count += other.count;
    maxHeight = Math.max(maxHeight, other.maxHeight);
    sumWeight += other.sumWeight;
    return this;
  }
  // Results
  public int getCount() {
    return count;
  }
  public int getMaxHeight() {
    return maxHeight;
  }
  public double getAvgWeight() {
    return getCount() > 0 ? (double) sumWeight / count : 0.0d;
  }
}

And now the final collector:

  • the supplier is the PersonStats implicit constructor
  • the accumulator is the accept method
  • the combinator is (you guessed it) the combine method
  • the optional finisher is absent because the PersonStats is the final result of the reduction, so we set the IDENTITY_FINISH characteristic
static Collector<Person, ?, PersonStats> PERSON_STATS =
  Collector.of(PersonStats::new,
               PersonStats::accept,
               PersonStats::combine,
               Collector.Characteristics.IDENTITY_FINISH);

It’s runtime! As simple as:

PersonStats stats = list.parallelStream()
                        .collect(PERSON_STATS);
System.out.printf("Person stats %s%n", stats);

// Person stats {count=3, max height=186, avg weight=75,33}

Not only we have consumed the stream once, but we have even lifted the parallelism provided by the API!

Adding the finisher, finally

Some smart people out there have already spotted a severe flow in the previous beauty: we are returning a PersonStats that is

  • mutable
  • an intermediate container

An immutable object is far more suitable in a functional style code, also the intermediate container can be hidden from the user eyes because contains only collector-specific logic!

Recapping with another example, here it is the intermediate container:

static class PeopleFinder {
  private Person heavy;
  private Person tall;
  // Accumulator
  void accept(Person p) {
    heavy = heaviest(heavy, p);
    tall = tallest(tall, p);
  }
  // Logic
  Person heaviest(Person a, Person b) {
    return a == null ||
        (a.getWeight() < b.getWeight()) ? b : a;
  }
  Person tallest(Person a, Person b) {
    return a == null ||
        (a.getHeight() < b.getHeight()) ? b : a;
  }
  // Combiner
  PeopleFinder combine(PeopleFinder other) {
    heavy = heaviest(heavy, other.heavy);
    tall = tallest(tall, other.tall);
    return this;
  }
  // FInisher
  BigPeople result() {
    return new BigPeople(heavy, tall);
  }
}

The immutable result object:

static class BigPeople {
  final Person heavy;
  final Person tall;
  BigPeople(Person heavy, Person tall) {}
  Optional<Person> heavyPerson() {}
  Optional<Person> tallPerson() {}
}

Finally, the new shiny collector:

static Collector<Person, ?, BigPeople> PEOPLE_FINDER =
  Collector.of(PeopleFinder::new,
               PeopleFinder::accept,
               PeopleFinder::combine,
               PeopleFinder::result);

Have you noticed? We can even omit PeopleFinder intermediate container from the collector’s type parameters!

BigPeople bigPeople = list.parallelStream()
                          .collect(PEOPLE_FINDER);
System.out.printf("Big people %s%n", bigPeople);
// Big people {heaviest Optional[{weight=86, height=186}], tallest Optional[{weight=86, height=186}]}

Final thoughts

What we learned from this post is that a Collector may seem like an obscure unknown black wizardry but that turned out to be simpler, and at the same time more powerful, than expected!

Happy collecting!

Comments