tomato/docs/md/graph.md
Green Sky 5c7231b7a3 Squashed 'external/entt/entt/' content from commit fef92113
git-subtree-dir: external/entt/entt
git-subtree-split: fef921132cae7588213d0f9bcd2fb9c8ffd8b7fc
2023-07-25 11:29:51 +02:00

10 KiB

Crash Course: graph

Table of Contents

Introduction

EnTT doesn't aim to offer everything one needs to work with graphs. Therefore, anyone looking for this in the graph submodule will be disappointed.
Quite the opposite is true. This submodule is minimal and contains only the data structures and algorithms strictly necessary for the development of some tools such as the flow builder.

Data structures

As anticipated in the introduction, the aim isn't to offer all possible data structures suitable for representing and working with graphs. Many will likely be added or refined over time, however I want to discourage anyone expecting tight scheduling on the subject.
The data structures presented in this section are mainly useful for the development and support of some tools which are also part of the same submodule.

Adjacency matrix

The adjacency matrix is designed to represent either a directed or an undirected graph:

entt::adjacency_matrix<entt::directed_tag> adjacency_matrix{};

The directed_tag type creates the graph as directed. There is also an undirected_tag counterpart which creates it as undirected.
The interface deviates slightly from the typical double indexing of C and offers an API that is perhaps more familiar to a C++ programmer. Therefore, the access and modification of an element will take place via the contains, insert and erase functions rather than a double call to an operator[]:

if(adjacency_matrix.contains(0u, 1u)) {
    adjacency_matrix.erase(0u, 1u);
} else {
    adjacency_matrix.insert(0u, 1u);
}

Both insert and erase are idempotent functions which have no effect if the element already exists or has already been deleted.
The first one returns an std::pair containing the iterator to the element and a boolean value indicating whether the element has been inserted or was already present. The second one instead returns the number of deleted elements (0 or 1).

An adjacency matrix must be initialized with the number of elements (vertices) when constructing it but can also be resized later using the resize function:

entt::adjacency_matrix<entt::directed_tag> adjacency_matrix{3u};

To visit all vertices, the class offers a function named vertices that returns an iterable object suitable for the purpose:

for(auto &&vertex: adjacency_matrix.vertices()) {
    // ...
}

Note that the same result can be obtained with the following snippet, since the vertices are unsigned integral values:

for(auto last = adjacency_matrix.size(), pos = {}; pos < last; ++pos) {
    // ...
}

As for visiting the edges, a few functions are available.
When the purpose is to visit all the edges of a given adjacency matrix, the edges function returns an iterable object that can be used to get them as pairs of vertices:

for(auto [lhs, rhs]: adjacency_matrix.edges()) {
    // ...
}

On the other hand, if the goal is to visit all the in- or out-edges of a given vertex, the in_edges and out_edges functions are meant for that:

for(auto [lhs, rhs]: adjacency_matrix.out_edges(3u)) {
    // ...
}

As might be expected, these functions expect the vertex to visit (that is, to return the in- or out-edges for) as an argument.
Finally, the adjacency matrix is an allocator-aware container and offers most of the functionality one would expect from this type of containers, such as clear or 'get_allocator` and so on.

Graphviz dot language

As it's one of the most popular formats, the library offers minimal support for converting a graph to a Graphviz dot snippet.
The simplest way is to pass both an output stream and a graph to the dot function:

std::ostringstream output{};
entt::dot(output, adjacency_matrix);

However, there is also the option of providing a callback to which the vertices are passed and which can be used to add (dot) properties to the output from time to time:

std::ostringstream output{};
entt::dot(output, adjacency_matrix, [](auto &output, auto vertex) {
    out << "label=\"v\"" << vertex << ",shape=\"box\"";
});

This second mode is particularly convenient when the user wants to associate data managed externally to the graph being converted.

Flow builder

A flow builder is used to create execution graphs from tasks and resources.
The implementation is as generic as possible and doesn't bind to any other part of the library.

This class is designed as a sort of state machine to which a specific task is attached for which the resources accessed in read-only or read-write mode are specified.
Most of the functions in the API also return the flow builder itself, according to what is the common sense API when it comes to builder classes.

Once all tasks have been registered and resources assigned to them, an execution graph in the form of an adjacency matrix is returned to the user.
This graph contains all the tasks assigned to the flow builder in the form of vertices. The vertex itself can be used as an index to get the identifier passed during registration.

Tasks and resources

Although these terms are used extensively in the documentation, the flow builder has no real concept of tasks and resources.
This class works mainly with identifiers, that is, values of type id_type. That is, both tasks and resources are identified by integral values.
This allows not to couple the class itself to the rest of the library or to any particular data structure. On the other hand, it requires the user to keep track of the association between identifiers and operations or actual data.

Once a flow builder has been created (which requires no constructor arguments), the first thing to do is to bind a task. This will indicate to the builder who intends to consume the resources that will be specified immediately after:

entt::flow builder{};
builder.bind("task_1"_hs);

Note that the example uses the EnTT hashed string to generate an identifier for the task.
Indeed, the use of id_type as an identifier type is not by accident. In fact, it matches well with the internal hashed string class. Moreover, it's also the same type returned by the hash function of the internal RTTI system, in case the user wants to rely on that.
However, being an integral value, it leaves the user full freedom to rely on his own tools if he deems it necessary.

Once a task has been associated with the flow builder, it can be assigned read-only or read-write resources, as appropriate:

builder
    .bind("task_1"_hs)
        .ro("resource_1"_hs)
        .ro("resource_2"_hs)
    .bind("task_2"_hs)
        .rw("resource_2"_hs)

As mentioned, many functions return the builder itself and it's therefore easy to concatenate the different calls.
Also in the case of resources, these are identified by numeric values of type id_type. As above, the choice is not entirely random. This goes well with the tools offered by the library while leaving room for maximum flexibility.

Finally, both the ro and rw functions also offer an overload that accepts a pair of iterators, so that one can pass a range of resources in one go.

Fake resources and order of execution

The flow builder doesn't offer the ability to specify when a task should execute before or after another task.
In fact, the order of registration on the resources also determines the order in which the tasks are processed during the generation of the execution graph.

However, there is a way to force the execution order of two processes.
Briefly, since accessing a resource in opposite modes requires sequential rather than parallel scheduling, it's possible to make use of fake resources to force the order execution:

builder
    .bind("task_1"_hs)
        .ro("resource_1"_hs)
        .rw("fake"_hs)
    .bind("task_2"_hs)
        .ro("resource_2"_hs)
        .ro("fake"_hs)
    .bind("task_3"_hs)
        .ro("resource_2"_hs)
        .ro("fake"_hs)

This snippet forces the execution of task_2 and task_3 after task_1. This is due to the fact that the latter sets a read-write requirement on a fake resource that the other tasks also want to access in read-only mode.
Similarly, it's possible to force a task to run after a certain group:

builder
    .bind("task_1"_hs)
        .ro("resource_1"_hs)
        .ro("fake"_hs)
    .bind("task_2"_hs)
        .ro("resource_1"_hs)
        .ro("fake"_hs)
    .bind("task_3"_hs)
        .ro("resource_2"_hs)
        .rw("fake"_hs)

In this case, since there are a number of processes that want to read a specific resource, they will do so in parallel by forcing task_3 to run after all the others tasks.

Sync points

Sometimes it's useful to assign the role of sync point to a node.
Whether it accesses new resources or is simply a watershed, the procedure for assigning this role to a vertex is always the same: first it's tied to the flow builder, then the sync function is invoked:

builder.bind("sync_point"_hs).sync();

The choice to assign an identity to this type of nodes lies in the fact that, more often than not, they also perform operations on resources.
If this isn't the case, it will still be possible to create no-op vertices to which empty tasks are assigned.

Execution graph

Once both the resources and their consumers have been properly registered, the purpose of this tool is to generate an execution graph that takes into account all specified constraints to return the best scheduling for the vertices:

entt::adjacency_matrix<entt::directed_tag> graph = builder.graph();

The search for the main vertices, that is those without in-edges, is usually the first thing required:

for(auto &&vertex: graph) {
    if(auto in_edges = graph.in_edges(vertex); in_edges.begin() == in_edges.end()) {
        // ...
    }
}

Starting from them, using the other functions appropriately (such as out_edges to retrieve the children of a given task or edges to access their identifiers) it will be possible to instantiate an execution graph.