The Bag data structure from Eclipse Collections

In computer science, a bag is defined as an abstract data structure, that allows to keep duplicate elements in any order. This is similar to a physical bag, where you could also put any elements and take them out randomly. So, bags are different from lists (because lists care about a particular position of an element) and from sets (because sets do not allow duplicates). The bag is a good choice, when you just need to collect items and to do some processing using iterations. Java does not offer its “vanilla” implementation of the bag, however you could find it in popular collections libraries. In this post we will review the Bag from Eclipse Collections, that supplies both mutable and immutable versions.

Create bags

Before we will proceed with various Bag methods, let observe how to initialize a new Bag instance within the Eclipse Collections framework. Likewise other types of collections, there are presented both mutable (modifiable) and immutable (non modifiable) versions. In general, we can use the Bags class, which allows to utilize static factory methods to obtain bags:

  • Bags.immutable.* calls ImmutableBagFactory to create immutable bags
  • Bags.mutable.* calls MutableBagFactory to create mutable bags

Both types uses same approaches, that can be separated into following three categories:

  • *.empty() = create an empty bag (the bag that does not contain any elements)
  • *.of()/*.with() = allow to create a bag, that contains a single element or many elements (using varargs in the later case)
  • *.ofAll()/*.withAll() = allow to create a bag with elements of the other Iterable implementation (basically any collection)

There is important to note, that generally all other Eclipse Collections data structures are created in a similar way. Now, as we defined how to obtain a new instance, let have a practical example. In the code snippet below we initialize a new ImmutableBag that keeps 4 elements, defined during its creation:

@Test
void createBagTest(){
    ImmutableBag<Employee> employees = Bags.immutable.of(
            new Employee(1, "Andrea Novakova", Department.MANAGEMENT),
            new Employee(2, "Borislav Vojtek", Department.IT),
            new Employee(3, "Denisa Zizkova", Department.DELIVERY),
            new Employee(4, "Marek Ruzicka", Department.HR)
    );

    Assertions.assertThat(employees.size()).isEqualTo(4);
}

That gives us the new Bag collection. As I mentioned already, mutable versions are created in a same manner, however they allow to modify their items. On the other hand, immutable data structures do not allow to modify elements and create new instances, when elements are inserted or removed, in order to not affect initial data. In next sections, we will observe common operations with bags, using Eclipse Collections.

Detect elements

From the theoretical point of view, bags do not track elements in any way, like, for instance, lists, that use an index to access a particular item. But we may need to search for elements in the bag and to verify that the element is presented. For that task we could rely on detection methods. The detection is what is called a short circuit operation, because an execution is terminated, once the logical condition is satisfied. This condition is defined in a form of a predicate (lambda expression which represents a single argument function that returns a boolean value). There are several detect methods available for bags. Let start with the general detect() one:

ImmutableBag<Employee> employees = Bags.immutable.of(
        new Employee(1, "Andrea Novakova", Department.MANAGEMENT),
        new Employee(2, "Borislav Vojtek", Department.IT),
        new Employee(3, "Denisa Zizkova", Department.DELIVERY),
        new Employee(4, "Marek Ruzicka", Department.HR)
);

// detect()
Employee andrea = employees.detect(e -> e.getName().equalsIgnoreCase("Andrea Novakova"));
Assertions.assertThat(andrea).isNotNull();
Assertions.assertThat(andrea.getEmployeeId()).isEqualTo(1);
Assertions.assertThat(andrea.getDepartment()).isEqualByComparingTo(Department.MANAGEMENT);

In this code snippet we look for an employee with a specific name. For this we provide a predicate function, that evaluates an element to have a name “Andrea Novakova”, and once the element is found, the execution is terminated and the element is returned. If the element within the defined query is not found, the method simply returns a null value. So, we need to do what is called “null checking”, that is not really a good idea. It is better to get an optional result, that can be performed with the detectOptional() method. Take a look on the code snippet below:

Optional<Employee> petr = employees.detectOptional(e -> e.getName().equalsIgnoreCase("Petr Vodicka"));
Assertions.assertThat(petr).isEmpty();

Lastly, you may want to return a default value, if the element is not found. That can be done using the detectIfNone() method:

Employee jana = employees.detectIfNone(e -> e.getName().equalsIgnoreCase("Jana Dvorakova"), 
                () -> new Employee(7, "Jana Novakova", Department.IT));

Assertions.assertThat(jana).isNotNull();
Assertions.assertThat(jana.getEmployeeId()).isEqualTo(7);
Assertions.assertThat(jana.getDepartment()).isEqualByComparingTo(Department.IT);

To sum up this section, we can say, that detection in Eclipse Collections is what is usually referred as find methods. These functions allow to look for a specific element in the bag, and optionally to return a default value. If you need to look for several elements, you need to review selections and rejections.

Select and reject elements

These two groups of methods are placed in the single section, because basically they work in a similar way. They both looks for elements into an existing collection, that are evaluated against a logical condition. The difference between selection and rejection is that the select operation returns a collection, that contain elements that do satisfy the predicate. From the opposite side, rejections do collect elements that do not satisfy the predicate.

Basic versions of both operations take a predicate, that is used to evaluate elements. Take a look on the usage of the select() method:

ImmutableBag<Employee> employees = Bags.immutable.of(
        new Employee(1, "Andrea Novakova", Department.MANAGEMENT),
        new Employee(2, "Borislav Vojtek", Department.IT),
        new Employee(3, "Denisa Zizkova", Department.IT),
        new Employee(4, "Marek Ruzicka", Department.HR)
);

// select with predicate
ImmutableBag<Employee> itEmployees = employees.select(e -> e.getDepartment() == Department.IT);
Assertions.assertThat(itEmployees.size()).isEqualTo(2);

The rejection operation works similar to the selection, but it returns a new collection, that do not contain elements, that are valid for the predicate:

@Test
void rejectTest(){
    ImmutableBag<Integer> numbers = Bags.immutable.of(1,2,3,4,5,6,7,8,9,10);
    ImmutableBag<Integer> odd = numbers.reject(e -> e % 2 == 0);
    Assertions.assertThat(odd.toList()).contains(1,3,5,7,9);
}

Besides, there are several built-in methods that can be used to select elements from the original bag that do not have duplicates or that do have duplicates:

  • selectUnique() = this method returns a set (collection of unique elements) containing all elements of the bag that are presented only once
  • selectDuplicates() = this method returns a bag containing all elements of the bag that are presented more than once

Take a look on the following code snippet below:

ImmutableBag<Integer> numbers = Bags.immutable.of(1,2,3,3,5,1,10,6,9,15);

// select unique (only that do not have duplicates!)
ImmutableSet<Integer> unique = numbers.selectUnique();
Assertions.assertThat(unique.castToSet())
        .containsExactlyInAnyOrder(2,5,6,9,10,15)
        .doesNotContain(1,3); // 1 and 3 have duplicates!

// select duplicates
ImmutableBag<Integer> duplicates = numbers.selectDuplicates();
Assertions.assertThat(duplicates.toList()).contains(1,3);

Use iterators

The iterator pattern is one of approaches to access elements of a collection, alongside with streams. From a technical point of view, the iterator traverses elements in a sequential and predictable order. In Java, the behavior of iterators is defined in the java.util.Iterator interface, which is a member of Java Collections Framework.

Iterators are similar to enumerators, but there are differences between these concepts too. The enumerator provides indirect and iterative access to each element of a data structure exactly once. From the other side, iterators does the same task, but the traversal order is predictable. With this abstraction a caller can work with collection elements, without a direct access to them. Also, iterators allow to delete values from a collection during the iteration.

Eclipse Collections permits to access iterators using the iterator() method, that is inhereted from the Iterable interface. Take a look on the example code below:

@Test
void iteratorTest(){
    ImmutableBag<Integer> numbers = Bags.immutable.of(1,2,3,4,5);
    Iterator<Integer> iterator = numbers.iterator();

    int sum = 0;
    while (iterator.hasNext()){
        int val = iterator.next();
        sum += val;
    }

    Assertions.assertThat(sum).isEqualTo(15);
}

Source code

You can access example code snippets for this post here. Feel free to explore it!

Summary

Bags are data structures, that permit to hold duplicates in any order. The standard Java Collections Framework does not supply with their implementations, but you can utilize the one from Eclipse Collections. In this post we reviewed the essential functionality of this class. If you have questions regarding this post or suggestions, please feel free to contact me.