You are on page 1of 19

1

What is a Workflow?

A Workflow is a series of decisions made by different people that determines what happens to a particular request that
one of those people made, according to a defined and repeatable process. An example of this process is shown in this
flowchart:

Example Steps in a Workflow


2

Let's illustrate that flow chart another way, by enumerating the steps:

John Doe submits a request that says "I need a site that would allow me to have my employees log their time off so I
can quickly review it."

John submits that request to his supervisor Victoria, who agrees with him and approves the request.

Victoria's approval sends the request to Nate, a lead developer, who looks over the initial request and decides that
there is a workable idea here that he and his team can implement as a website. He approves the request.

Nate's approval sends the request to Jenna, the coordinator, who starts conducting research (probably involving John,
Victoria, and Nate) to determine how much effort this request would require.

If Jenna approves the request, it is assigned back to Nate with orders to begin development.

Nate and his team do the development work. This step could take a variable amount of time.

Once development work is done, the request is sent to the QA team for approval, lead by Charles.

If Charles and his team find no testing issues, he approves the project and sends it to John for final approval
(essentially all of teams are together asking John, "this is what you wanted, right?").

If John approves the project, it is marked complete, and no more action can be taken against it.

Lots of Similarity

We had many different kinds of workflows in our organization, and the management group wanted us programmers to
determine if we could build a generic, database-driven service that could represent many (if not all) of these processes
in one central database.

After reviewing several of the currently-existing workflows, my team and I discovered that they all had similar
components:

A Request object that is reviewed, approved, or implemented by various people.

A set of highly-variable data that was associated to each Request.

A series of decisions (called a Process) that determine who was next going to review the Request. These decisions
must always be submitted to the engine by an individual person, but that individual could be a person related to the
Request (e.g. the Requester or a stakeholder) or one of a group of people.

A set of notifications that could go to various groups of people.

A small number of people that were in charge of the Process itself.

Designing a Workflow Engine Database

In this series, we're going to walk through the database design of our Workflow app and show each part of the solution
was implemented, and finally how they were all wired together. We're going to do this in eight parts (this post is Part
1):

Part 2: The Process Table and Users

Part 3: Request Details and Data

Part 4: States and Transitions

Part 5: Actions and Activities


3

Part 6: Groups and Targets

Part 7: Request Actions

Part 8: Complete Schema and Shortcomings

This isn't the only design that can implement Workflows, and it has been simplified from our actual implemented
design. But I believe that putting my design ideas down on paper (as it were) will help both me understand my design
better and, hopefully, any other budding DBAs out there when they are trying to design a database for their own non-
trivial problems.

IMPORTANT NOTE: I am not a DBA, and this design has not been thoroughly reviewed by someone who is. DBAs
of the world, let me know how I can improve this design in the comments! All suggestions are welcome!

One final note: in this series we're only concerned with showing how to design this engine, not actually implement it
in code. What this means is that we won't cover certain details, such as:

How the Users get into our database OR

How the Emails are actually sent to recipients.

With those caveats out of the way, come along with me as we explore how to design a Workflow Engine database,
starting with the Process Table and Users!

Before we can design anything else in this Workflow Engine Database, we first need to define what exactly makes a
Process, as well as who our Users are and which of them can modify the process itself.

The Process

A Process is the collection of all other data that is unique to a group of users and how they want their Requests
approved. In our design, a Process is infrastructure that is used to define and associate most other information in this
database.

Our Process table is very simple, as it is just an ID and a Name:

This table, though it is very simple, is the central point of reference for the rest of the design; most of the tables in this
database (e.g. States, Transitions, Requests, etc.) will need to be related back to this table, either directly or indirectly.

Users

We also need a table to list the Users that can log into this app; this will be a lookup table that has all of our users in it.
It'll look like this:
4

We don't really care how the users get added to this table; we only care that they will exist here and that this table will
act as a central access point for user data.

You might be wondering why Users are not specific to Processes. We used this design in order to allow for persons to
have jobs in several different Processes without needing to store User data in multiple places.

Process Admins

We also want to allow for a small set of Users that can change the process itself; these people are called Process
Admins. The table for Process Admins is just a many-to-many relationship table between Process and User:

What did we accomplish?

We created the foundation that the rest of the Workflow Engine can build upon by creating the Process and Users
tables, and we established Process Admins that will be able to modify the Process itself.

But this is only the beginning of our journey. In the next installment, we will discuss what actually makes a Request
when we talk about Request Details and Data.

Now that we've got the Process and Request tables created, we can start building tables for the actual process itself. In
this part, we'll design the tables that will hold the different States a Request can be in as part of a Process, and we'll
also design the tables that show how to get from one State to another (which are called Transitions). First, though, we
need a table that shows what different types of States can exist.

Let's get started!

State Types

A State Type is a categorization of the individual States. In our design, it is a unchangeable list, so in code we would
probably use an Enumeration to represent it. Since we're making this database design fully-normalized, we're going to
include this set of data as a table, with the following structure:
5

Because we don't want to have any user-defined State Types, there's no relationship to Process for this table. We're
treating this table as unchangeable, and will always have the following values:

Here's the reasoning for each of these types:

Start: Should only be one per process. This state is the state into which a new Request is placed when it is created.

Normal: A regular state with no special designation.

Complete: A state signifying that any Request in this state have completed normally.

Denied: A state signifying that any Request in this state has been denied (e.g. never got started and will not be worked
on).

Cancelled: A state signifying that any Request in this state has been cancelled (e.g. work was started but never
completed).

Every State must have exactly one of these State Types. But what makes a State?

States

A State is a position in the Process that a given Request can be in at any given moment. States are unique to Processes,
and each State has a name, a description, and a type. Our State table looks like this:

We should remember, though, that each Process is supposed to represent a flow chart, and to do that we need to be
able to move Requests between the States. We can do so by designing tables for Transitions.
6

Transitions

If a the lead developer approves a request, and it should now go to the Coordinator team for research, how can we
design our data so as to represent that that move is possible? We create a Transition.

A Transition is a path between two States that shows how a Request can travel between them. Transitions are unique
to Processes, and therefore a Transition is comprised of a primary key, a Process ID, a current state, and a next state:

What did we accomplish?

Our current database design (showing only States, Transitions, and Process tables) looks something like this:

In this post, we gave form to the building blocks of the Process by defining the States where a Request can exist and
the Transitions between those States.

We still have an outstanding problem, though: how do we invoke our Transitions? How do we actually cause a
Request to move from one State to another And what should happen each time we go to a new State (or follow a new
Transition) for a given Request? All that and more in the next post, Part 5 of this adventure, which discusses Actions
and Activities.

Having already defined our Process Infrastructure, our Request structure, and our States and Transitions, we can now
start to design tables for what a User can actually do to a Request in this engine. For that, we will be defining two
terms:

Actions: Things a user can perform on a Request.

Activities: Things that result from a Request moving to a particular State or following a particular Transition.

Let's start by creating Actions and Action Types.


7

Actions

Actions are things a user can perform upon a Request.

Say we've got a request to build a new grocery store, and that request includes the address of the development where it
should be built. The person who is in charge of approving new store construction, John, takes a look at the request and
decides that, yeah, it's a good idea to build a store here. He submits an Approval to the Request, which can cause the
Request to go to the next state. John has submitted an Action.

Since we don't want to allow an infinite number of kinds of actions that can be performed, we are going to group
Actions together via an ActionType table that looks like this:

Just like the StateType table, this table is independent of the Process and will be considered static. For our database,
we'll use the following Action Types:

Why are we using these action types?

Approve: The actioner is suggesting that the request should move to the next state.

Deny: The actioner is suggesting that the request should move to the previous state.

Cancel: The actioner is suggesting that the request should move to the Cancelled state in the process.

Restart: The actioner suggesting that the request be moved back to the Start state in the process.

Resolve: The actioner is suggesting that the request be moved all the way to the Completed state.

The reason we say the person is "suggesting" that the request be moved is that we want to allow for a process to
require multiple Actions to invoke a Transition. It is possible that a request will need multiple things to happen before
it can continue in the process, and we want to allow for that scenario.

Now we need the table for the Actions themselves. Actions are unique to Processes, and each have an ActionType, so
our table will look like this:
8

Our design for ActionTypes and Actions looks like this:

Transition Actions

Now that we've defined what Actions could ever be performed, we need to get more specific: which Actions can be
performed for a particular Transition?

The relationship between Transition and Action is many-to-many:

Activities

Activities are things that can happen as a result of a Request entering a State or following a Transition.
9

For example, let's see the diagram from Part 1 again.

In Step 3 of this flowchart, we may want to add the Lead as a stakeholder on a request, so that s/he will receive
automatic emails about the status of that request. However, if the Lead denies the request we will want to notify the
Requester, but if s/he approves the request we need to notify the Coordinators.
10

In other words, in this example adding a stakeholder is an activity that we want to happen when a Request reaches a
certain state, and sending email is an activity that we want to happen when a certain transition is followed. We need to
design for both scenarios.

First, we need to know what kinds of actions we can do. This table is just like StateType and ActionType in that it is
not unique to the Process and can be considered static. Here's the design for the ActivityType table:

Just like ActionType, the values for ActivityType are static. We'll use the following values:

Add Note: Specifies that we should automatically add a note to a Request.

Send Email: Specifies that we should send an email to one or more recipients.

Add Stakeholders: Specifies that we should add one or more persons as Stakeholders on this request.

Remove Stakeholders: Specifies that we should remove one or more stakeholders from this request.

You could define quite a few more kinds of ActivityTypes, but for now we'll just use those four.

The last thing we need to do is design our Activity table, which will look a lot like the Action table:

State and Transition Activities

Once we've got the base Activity table defined, we can start designing how the Activities are associated to States and
Transitions. As a reminder, we want to be able to kick off Activities in two situations:

When the Request enters a State


11

When the Request follows a Transition

This means that we still need to associate Activities with States and Transitions, like so:

What did we accomplish?

Our database diagram (showing the Process, States, Transitions, Actions, and Activities) looks like this:

In this post, we demonstrated how we can store what Actions can be performed by Users, and what kinds of Activities
can be kicked off by certain States or Transitions. In essence, we showed what Users can do to Requests, and what
happens to the Users as a result.
12

We still have a piece of all this missing, though: exactly who can actually perform the Actions or receive the
Activities? We'll answer that question in the next post, Part 6 of this series, where we will discuss Groups and Targets.

We've now got most of the Process tables defined, but we are still missing a few things.

One of those things is a definition for exactly who can perform Actions or receive Activities; we're going to call
this Targets.

The second piece we need is Groups, or collections of people that perform a similar or related job in this process. In
our design, we want to allow for a Group to also be a Target. Let's see how we can design these tables!

Groups

A Group is a collection of Users that perform related functions.

For example, we might have a group called Programmers that actually write code, or a group called Executives that
approve projects for development. Because we are building a generic Workflow Engine, we want to make sure that
each Process can have its own Groups.

The Group table is pretty simple:

We also need a table to represent which Users are in a given group, which will be another many-to-many table. This
table is called GroupMember:

Targets

Remember that this system is people-focused; only people can action the Request. We still need a way to associate
which persons (or Groups) can perform Actions and receive Activities.

The problem is that we may need to specify that only the Requester of a given Request can send that request to his/her
supervisor, or that all of this set of people need to receive an email when a Request reaches a certain State. How can
we design this in such a manner that it is flexible in who can perform the Action or receive the Activity, but still
implements a few rules that the engine must follow?

We can accomplish this by creating Targets. A Target is a set of standardized representations of a person who have
specific roles relative to a Request or Process. We use the following targets:

Request Creator (Requester)


13

Request Stakeholders

Group Members

Process Admins

Our Target table looks like this:

Because this is another static table (like StateType, ActionType, and ActivityType), we don't expect the data in it to
ever change. Here's the data we will be using for this design:

Action Targets and Activity Targets

The Targets table does us no good unless we can relate it to other tables that can actually use the Targets. We want to
use Targets in two scenarios:

As people who can perform Actions

As people who can receive Activities

Let's design Action Targets first. For every Action that can be submitted to this engine, we need to define who can
actually submit that action for it to be considered valid. After all, we don't want janitor Kevin approving the
construction of a new grocery store, since that's not his responsibility. With this in mind, our table design looks like
this:

We needed to include Group ID because if our Target is a Group, we have to specify which Group can perform the
Action.
14

Now let's discuss Activity Targets. Depending on the kind of activity, the Targets for that activity could receive an
email, or be added to the stakeholders list for a Request, etc. Because we've defined a central list of Targets, the
ActivityTarget table looks remarkably similar to the ActionTarget table:

IMPORTANT NOTE: When using a Group as a Target, the way the system interprets this relationship is different.

If the Group is an Action Target, then any member of the Group can perform the action for it to be valid.

If the Group is an Activity Target, then all members of the Group receive the Activity (e.g. everyone in the group gets
an email).

What did we accomplish?


15

Our design for Actions, Activities, Groups, and Targets now looks like this:

In this part, we nailed down exactly who could perform Actions and receive Activities by creating Targets, and we
gave our engine a bit more flexibility by creating Groups of Users who could each do the same thing as the others.

In the next part of this series, we get to the real meat of the system. We'll show how individual Requests can track
which Actions can be performed against them, and we'll see how we can use that list to determine which Transition
the Request needs to follow. Next up is Part 7 of this series, Request Actions.

Request Actions

So we now have Actions that Users can perform to invoke Transitions. Obviously we can't allow for any action to be
performed against the Request; only the actions we need should be allowed.

The Table

Let's look at the schema of the RequestActions table first, and then show how it would actually be used:
16

Those last two columns (IsActive and IsComplete) are very important to the actual execution of this table.

Here's how we're going to use this table.

When a Request enters a State, we get all outgoing Transitions from that state. For each Action in those Transitions,
we add an entry in RequestAction, with each entry having IsActive = 1 and IsCompleted = 0.

A User may submit an Action at any time. Each submitted Action consists of an ActionType, a RequestID, and a
UserID.

When an Action is submitted, we check the RequestActions for the specified Request. If the submitted Action matches
one of the active RequestActions (where IsActive = 1), we set that entry's IsActive = 0 and IsCompleted = 1.

After marking the submitted Action as completed, we check all Actions for that Transition in that Request. If all
RequestActions are marked as Completed, then we disable all remaining actions (by setting IsActive = 0, e.g. all
actions for Transitions that were not matched).

Sample Walkthrough

Let's see if we can see how this works by introducing some sample data.

USERS: Jane (ID 1), Tom (ID 2), Gary (ID 3)

GROUPS: Executives (ID 1), includes Tom and Gary

STATES: A (Type: Start), B (Type: Normal), C (Type: Denied)

TRANSITIONS: A -> B (ID 1), A -> C (ID 2), B -> C (ID 3)

TRANSITION ACTIONS:
A -> B: Approved by Requester (ID 1) AND Approved by Executives (ID 2)
A -> C: Denied by Executives (ID 3)
B -> C: Denied by Requester (ID 4)

Let's say Jane creates a Request, which immediately is placed into State A.

At this point, the system looks for all outgoing transitions from State A and finds two of them, Transitions 1 and 2. It
then loads the following data into RequestActions:

REQUESTID ACTIONID TRANSITIONID ISACTIVE ISCOMPLETE


17

1 1 1 YES NO

1 2 1 YES NO

1 3 2 YES NO

Now the Request just sits in its current state, waiting for an Action to be submitted.

Say Jane submits this action:

ACTION:
User ID: 1
ActionType: Approve
Request ID: 1
(We read that as "User 1 approves Request 1")

Since that action matches the first RequestAction in the table, it is marked as completed:

REQUESTID ACTIONID TRANSITIONID ISACTIVE ISCOMPLETE

1 1 1 NO YES

1 2 1 YES NO

1 3 2 YES NO

At this point, we have not matched all Actions for a Transition, so nothing happens to the Request; it remains in State
A.

Now say Tom submits this action:

ACTION:
User ID: 2
ActionType: Approve
Request ID: 1
(User 2 approves Request 1)

After that action gets matched, the table of Request Actions now looks like this:

REQUESTID ACTIONID TRANSITIONID ISACTIVE ISCOMPLETE

1 1 1 NO YES

1 2 1 NO YES

1 3 2 YES NO

Notice that because both Actions for Transition 1 are complete, we now must follow Transition 1 and move the
Request to the next State, which is State B. After we move to State B, we load the Actions for the Transitions from
that State and disable any old Actions; our RequestActions table looks like this:
18

REQUESTID ACTIONID TRANSITIONID ISACTIVE ISCOMPLETE

1 1 1 NO YES

1 2 1 NO YES

1 3 2 NO NO

1 4 3 YES NO

In this manner, we can keep track of all actions that have been performed, that could have been performed, and that
are still waiting to be performed, in the same table.

What did we accomplish?

In this post, we implemented the last piece of our structure: the Request Actions table. This table stores all Actions
that can be made against a particular Request, and is the driving force behind how this engine actually works.

There's still one last part we will talk about, and that's actually building and demoing a process that runs on this
engine. The final part of our saga is Part 8, where we will discuss the Complete Schema and Shortcomings of our
Workflow Engine database design.

We've now completed the database design for our Workflow Engine, so let's take a look at the final
database diagram (click here for a larger view):

That's a lot of tables! But we're trying to design a generic engine, one that can be used for many disparate
processes, and as such the complexity of this design reflects the complexity of the requirements.

Shortcomings of This Design

There are a few shortcomings of this design:


19

Currently, there's no way to say "Stay in this state for X days; after X days, move to another state." We cannot cause
Requests to move along in the Process due to time elapsed.

We don't have a procedure in place to handle a State or Transition ceasing to exist (meaning, even though the Process
Admins are supposed to be able to change the Process itself, what happens when they actually do that?)

The RequestFiles table stores the actual files in the table itself; a more performant way to do this would be to store the
files on a filestore somewhere and just keep references to them in this table.

We don't currently have a way to designate that a specific individual needs to perform an Action or receive an
Activity.

That said, most of these shortcomings are fairly easy to implement; we actually did solve a few of them for
our company's workflow engine.

Remember: this is only a database schema. The design doesn't account for how this service would actually be
implemented in code.

Whew! That was a lot of steps! But what we ended up with was a generic, extensible, reusable workflow engine
database design that can handle a huge variety of processes.

Thanks for reading! If you found this series helpful (or just need to yell at me for something I did incorrectly)
let me know in the comments.

You might also like