From filtering and grouping to custom collectors—write cleaner, faster Java code with Streams.
Why Streams? (60-second refresher)
Java Streams let you describe data transformations (filter, map, group, reduce) in a fluent, declarative way. They encourage non-interference (don’t mutate the source) and stateless lambdas for correctness and parallel safety
- From filtering and grouping to custom collectors—write cleaner, faster Java code with Streams.
- Why Streams? (60-second refresher)
- Sample domain
- 1) Filter → Map (the daily driver)
- 2) flatMap (one-to-many)
- 3) mapMulti (Java 16)
- 4) Collecting lists: toList() vs Collectors.toList()
- 5) Sorting with Comparator
- 6) Distinct & de-dupe by key
- 7) Grouping with groupingBy (+ mapping)
- 8) Two buckets? partitioningBy
- 9) Stats with summarizingInt/Double
- 10) Build CSV with joining
- 11) Slice streams: takeWhile / dropWhile (Java 9)
- 12) Combine two results at once: Collectors.teeing (Java 12)
- Bonus: Parallel Streams – use with care
- Common pitfalls & how to avoid them
- Quick checklist
Sample domain
record Order(long id, String customer, double amount, List<String> tags) {}
List<Order> orders = List.of(
new Order(1, "Alice", 120.0, List.of("new","priority")),
new Order(2, "Bob", 40.0, List.of("gift")),
new Order(3, "Alice", 75.0, List.of("repeat","promo")),
new Order(4, "Cara", 210.0, List.of("priority"))
);
1) Filter → Map (the daily driver)
Keep what you need, transform what you keep.
List<String> vipCustomers = orders.stream()
.filter(o -> o.amount() >= 100)
.map(Order::customer)
.distinct()
.toList(); // Java 16+
This follows the Stream rule of stateless and non-interfering operations.
2) flatMap (one-to-many)
Flatten nested collections (e.g., tags across orders).
Set<String> allTags = orders.stream()
.flatMap(o -> o.tags().stream())
.collect(java.util.stream.Collectors.toSet());
flatMap is for “one-to-many then flatten.”
3) mapMulti (Java 16)
Like flatMap, but you push zero-to-many outputs into a consumer—great when you want to avoid intermediate streams.
List<String> importantTags = orders.stream()
.<String>mapMulti((o, out) -> {
if (o.amount() >= 100) o.tags().forEach(out);
})
.toList();
mapMulti was added in Java 16 and can be more efficient than flatMap in some cases.
4) Collecting lists: toList() vs Collectors.toList()
Stream.toList()(Java 16+) returns an unmodifiable list.Collectors.toList()returns a list but does not guarantee mutability/type; many implementations are mutable, but it’s unspecified.
List<String> a = orders.stream().map(Order::customer).toList(); // unmodifiable
List<String> b = orders.stream().map(Order::customer)
.collect(java.util.stream.Collectors.toList()); // unspecified
Prefer toList() when you want immutability. Use Collectors.toList() or toCollection(ArrayList::new) when you truly need a mutable list.
5) Sorting with Comparator
Sort by amount descending, then customer.
List<Order> sorted = orders.stream()
.sorted(java.util.Comparator
.comparingDouble(Order::amount).reversed()
.thenComparing(Order::customer))
.toList();
sorted is a stable, intermediate operation.
6) Distinct & de-dupe by key
distinct() uses equals/hashCode. For key-based de-dupe, collect into a Map and supply a merge function.
// Keep the largest order per customer
Map<String, Order> byCustomerMax = orders.stream()
.collect(java.util.stream.Collectors.toMap(
Order::customer,
o -> o,
(o1, o2) -> o1.amount() >= o2.amount() ? o1 : o2
));
Without the merge function, duplicate keys throw IllegalStateException.
7) Grouping with groupingBy (+ mapping)
Compute total spent per customer:
Map<String, Double> totalByCustomer = orders.stream()
.collect(java.util.stream.Collectors.groupingBy(
Order::customer,
java.util.stream.Collectors.summingDouble(Order::amount)
));
Or group and transform values with mapping:
Map<String, List<Double>> amountsByCustomer = orders.stream()
.collect(java.util.stream.Collectors.groupingBy(
Order::customer,
java.util.stream.Collectors.mapping(Order::amount, java.util.stream.Collectors.toList())
));
groupingBy supports downstream collectors and custom map types.
8) Two buckets? partitioningBy
Split into “big” vs “small” orders:
Map<Boolean, List<Order>> partitions = orders.stream()
.collect(java.util.stream.Collectors.partitioningBy(o -> o.amount() >= 100));
You can add a downstream collector to aggregate each bucket.
9) Stats with summarizingInt/Double
Get count, sum, min, max, average in one pass.
java.util.DoubleSummaryStatistics stats = orders.stream()
.collect(java.util.stream.Collectors.summarizingDouble(Order::amount));
// stats.getCount(), getSum(), getMin(), getAverage(), getMax()
These are predefined collectors in Collectors.
10) Build CSV with joining
Join customer names with commas (unique, ordered):
String csv = orders.stream()
.map(Order::customer)
.distinct()
.sorted()
.collect(java.util.stream.Collectors.joining(", "));
joining concatenates CharSequences with an optional delimiter, prefix, suffix.
11) Slice streams: takeWhile / dropWhile (Java 9)
Great for already-sorted streams.
List<Order> firstExpensiveRun = orders.stream()
.sorted(java.util.Comparator.comparingDouble(Order::amount).reversed())
.takeWhile(o -> o.amount() >= 100)
.toList();
These methods arrived in Java 9.
12) Combine two results at once: Collectors.teeing (Java 12)
Compute min and max in one pass:
record Range(double min, double max) {}
Range range = orders.stream()
.collect(java.util.stream.Collectors.teeing(
java.util.stream.Collectors.minBy(java.util.Comparator.comparingDouble(Order::amount)),
java.util.stream.Collectors.maxBy(java.util.Comparator.comparingDouble(Order::amount)),
(minOpt, maxOpt) -> new Range(minOpt.orElseThrow().amount(),
maxOpt.orElseThrow().amount())
));
teeing feeds the same stream to two collectors and merges the results.
Parallel streams can speed up CPU-bound, heavy pipelines on large data if operations are stateless, associative, and don’t rely on ordering. But they add overhead and can surprise you on ordering/side-effects. Always measure.
double total = orders.parallelStream()
.mapToDouble(Order::amount)
.sum(); // OK (associative, numeric)
Common pitfalls & how to avoid them
- Modifying source or shared state in a pipeline → breaks non-interference; can cause race conditions in parallel. Prefer pure functions and collect into new containers. Oracle Documentation
- Assuming
Collectors.toList()is mutable → spec doesn’t guarantee; if you need mutable, usetoCollection(ArrayList::new). PreferStream.toList()for immutable results. Oracle Documentation - Using parallel for I/O-bound workloads → likely slower; keep it sequential or use async/concurrency primitives instead. Oracle Blogs
Quick checklist
- Prefer
toList()(Java 16+) for immutable results; picktoCollectionwhen mutability is required. Oracle Documentation+1 - Use
groupingBywith downstream collectors for rollups. Oracle Documentation - Reach for
mapMultiwhenflatMapwould create many short-lived streams. Oracle Documentation+1 - Reach for
teeingto compute two aggregates in one pass. nipafx // You. Me. Java. - Consider
takeWhile/dropWhilefor slicing sorted input.
Further reading (official & high-quality)
- Stream API (package & interface docs) — semantics, non-interference, ordering. Oracle Documentation+1
- Collectors (JDK 21 docs) — all built-ins: grouping, partitioning, joining, summarizing, toMap. Oracle Documentation
Stream.toList()&mapMulti(JDK 16 docs) — the Java 16 additions. Oracle Documentation- Java 9 additions (
takeWhile,dropWhile). Oracle Documentation Collectors.teeing(Java 12) — concept & examples. nipafx // You. Me. J

