32.2 Working with TextPattern

TextPattern was introduced in DHIS2 version 2.29, as a way of defining a pattern that includes variables, generated values and raw text, which then could be generated into a text value. The current use-case for TextPattern is automatically generated attributes for tracked entities, where you want to generate for example unique ids based on a specific pattern.

This guide will cover both basic and advanced topics for working with TextPattern, but is mainly focused on how you can define TextPatterns and which limitations and caveats exists.

32.2.1 TextPattern syntax

A TextPattern is a sequence of segments , joined together by the “+” character. A segment has a specific notation and in most cases a parameter format , which allows for further manipulation of the value.

TextPattern segments
Segment notation Description Paramenter (format) Example (segment → input value → result)
“Plain text” The plain text segment will remain unchanged in all generated values. This special segment is defined by wrapping text between two double quotes. If your pattern should include separation symbols like a dash, you should use this “-”. None “Hello world” → None → Hello world
CURRENT_DATE(format) Current date segment will be generated by the server at the time of generation. This is useful if you want your patterns to have a time-constraint that is disconnected from the context. You should not use this if you need to control which date is injected into the pattern. Date format CURRENT_DATE(yyyy) → 01-01-2018 → 2018
ORG_UNIT_CODE(format) This segment represents the organisation unit code associated with the generation. Text format ORG_UNIT_CODE(…) → OSLO → OSL
RANDOM(format) Random segments will be replaced by a value randomly generated by the server based on the format. Generated segments, like Random, bases its uniqueness on the rest of the pattern. That means a random value can appear twice, as long as the rest of the pattern is different, which means the generated text as a whole will be unique. Generation format RANDOM(X####) → None → A1234
SEQUENTIAL(format) Sequential segments will be replaced by a number, based on a counting value on the server. Sequential segments will start at the value 1, and for each generated value count up until no more values are available, based on the format. Like Random segments, uniqueness is based on the rest of the pattern, so each possible version of the pattern will have it’s own sequential counter starting from 1. Generation format “A”+SEQUENTIAL(###) → None → A001

“A”-SEQUENTIAL(###) → None → A002

“B”-SEQUENTIAL(###) → None → B001

“B”-SEQUENTIAL(###) → None → B002

Most segments has a parameter format , except for the plain text segment. The following table lists the available formats, how they are used and example notations using them.

Parameter formats
Format Description Example
Date format This format is based directly on the java SimpleDateFormat, which means any pattern valid for SimpleDateFormat, will be valid as a date format in TextPattern CURRENT_DATE(dd-MM-yyyy) → 31-12-2018

CURRENT_DATE(MM-yyyy) → 12-2018

Text format The text format allows for some basic text manipulation. Leaving the format empty will return the value unmodified, but using “^”, “.” and “$”, you can modify the value before it is returned. Each “.” represents a character, while “^” represents the start of the text and “$” represents the end. When using formats, the input value must be at least the same length as the format.

ORG_UNIT_CODE(….) → OSLO

ORG_UNIT_CODE(..) → OS

ORG_UNIT_CODE(..$) → LO

ORG_UNIT_CODE(^…$) → OSLO

^….$ will require the input value to be exactly 4 characters.

Generation format The generation format accepts a combination of one or more of he following characters: “#”, “X”, “x” and "*“. They respectively represent a number(0-9), an uppercase letter (A-Z), a lowercase letter(a-z) or any of the above(0-9,a-z,A-Z). The SEQUENTIAL segment only accepts”#“, since it will only generate numbers. The number of characters in the format decides the size of the value generated. Using just one”#" will in other words only allow for 10 values (0-9), while “###” will allow for 1000 values (000-999). SEQUENTIAL generated values have leading zeroes, so the length of the generated value will always match the format length. RANDOM(X###) → A123

RANDOM(****) → 1AbC

SEQUENTIAL(###) → 001

SEQUENTIAL(######) → 000001

A few important things to note regarding the formats:

  • Date format is very versatile, but be aware of which date or time components you are using. Using components smaller than a day (For example hours or seconds) is not recommended, even though available.

  • Text format allows for marking both the start and end of the input value, but “^…” and “…” will in reality give exactly the same results. The only time you would want to use “^” is when you want to enforce the length of the input value. For example, “^….$” will accept OSLO, since its 4 characters between the start and end, but PARIS will be rejected, since it has 5 characters.

  • When text format is used for unique values, like organisation unit code, make sure that the format does not break the uniqueness. (Example: ORG_UNIT_CODE(..) for “PARIS” and “PANAMA CITY” would both return PA, which means these two organisation units would in reality share generated values)

  • Generation format is the primary way to understanding the capacity of your pattern. Make sure the format is long enough to cover more values than you need.

To finish off the syntax section of the tutorial, here is a couple of example TextPattern:

ORG_UNIT_CODE(...) + "-" + CURRENT_DATE(yyyyww) + "-" + SEQUENTIAL(#####)

This pattern will have 99999 possible values (based on SEQUENTIAL. 00000 is never used since we start at 1). In addition, the remaining pattern will change for each different organisation unit generating values (ORG_UNIT_CODE) and for each week (CURRENT_DATE(yyyyww) represents year and week). That effectively means every new week, each organisation unit will have 99999 new values they can use.

"ABC_" + RANDOM(****)

The plain text segment of this pattern, will make no difference in the total capacity of the pattern, however the generated segment (RANDOM) will allow for 14776336 possible values. The reason for this is that * can be any one character of the 62 characters available (0-9, a-z, A-Z). You can read more about understanding pattern capacity further down in the tutorial.

32.2.2 Designing TextPattern for generating ids

One use-case for TextPattern is to generate unique ids. In this section we will present guidelines and common issues related to designing TextPatterns used for ids.

An id should never contain sensitive information, or information that in combination can identify an individual. TextPattern does not currently support segments that uses these kind of values, but might do so in the future.

The following list highlights some of the TextPattern specific restrictions you need to consider when designing a TextPattern for ids:

  • Make sure the capacity (number of possible values) of the TextPattern covers your use-case. It’s better to have more values than needed than less. Tracked entity attributes using TextPattern will require that a single generated segment is present in the TextPattern.

  • A TextPattern is unique in the entire system, but only for the object using it. In other words, if you have a single tracked entity attribute with TextPattern, used by multiple Tracked entities (Not to be mistaken for tracked entity instances), all values generated will be shared between all traced entities using the attribute. This also means that if you have two tracked entity attributes with the same TextPattern syntax, each attribute will be able to generate the same value as the other, since uniqueness is based on the attribute.

  • SEQUENTIAL segments are in the implementation numbers starting from 1, increasing by 1 for each value, sequentially until no more values are available. However, in reality you will most likely end up with gaps when users generate and reserve values that is never used, or if a user sends in a value where the SEQUENTIAL segment has a higher value than recorded on the server.

  • The current implementation relies on the user-client to send in the values contained in the TextPattern when storing a new value. That means generating a correct id is depending on the user, and user-client, to provide the correct data.

32.2.3 Understanding TextPattern capacity

The most important thing to keep in mind when designing a TextPattern, is the capacity - that means the total number of potential values a TextPattern can yield.

With the current implementation of TextPattern, there are three main factors that decides the capacity:

  1. Capacity of the generated segment in the TextPattern

  2. The presence of a CURRENT_DATE segment

  3. The presence of a ORG_UNIT_CODE segment

The presence of a date segment (like CURRENT_DATE) will effectively reset the capacity each time the segment changes. Depending on the date format, it can change anywhere to yearly to daily. Important: If your date format don’t contain a year, the pattern will resolve to the same value every year. That means values will already be used. For example, if your TextPattern looks like this:

CURRENT_DATE(ww) + "-" + RANDOM(#)

This pattern will give you up to 10 unique values for each week, but after 1 year, CURRENT_DATE(ww) will be the same as last year, and you will have no new values available. If you use “yyyy-ww” instead, it will be unique for every year, every week.

Organisation unit codes will make your values unique for each different organisation unit, which means if you have a text pattern like this:

ORG_UNIT_CODE() + "-" + RANDOM(#)

This pattern will give you 10 unique values for each different organisation unit.

32.2.3.1 Calculating capacity for generated segments

Understanding how to calculate the capacity of a TextPattern is critical when designing TextPatterns. The generated segments will be the main component of any TextPattern in terms of capacity, then increased based on the presence of ORG_UNIT_CODE or CURRENT_DATE.

Let’s start with SEQUENTIAL segments. Each “#” in the format represents a number between 0 and 9. To calculate the total capacity, you multiply the number of possible values for each “#”. Since it’s always 10 (0-9) the maths is quite straight forward:

SEQUENTIAL(#) = 10 = 10
SEQUENTIAL(###) = 10 * 10 * 10 = 1000
SEQUENTIAL(#####) = 10 * 10 * 10 * 10 * 10 = 100000

Since SEQUENTIAL counters on the server start at 1 and not 0, the actual capacity is 999, but that’s insignificant in most cases.

As soon as we involve RANDOM, the calculation becomes a bit more complicated. Similar to SEQUENTIAL, a “#” has 10 possible values, in addition we have “X” and “x” with 26 possible values each, as well as “*” which can be any of the previous, which means 62 (10+26+26) possible values.

To calculate the capacity, you need to take each character in your format and replace with the number of possible values, then multiply them all together like we did for SEQUENTIAL:

RANDOM(#) = 10 = 10
RANDOM(X) = 26 = 26
RANDOM(*) = 62 = 62

RANDOM(X##) = 26 * 10 * 10 = 2600
RANDOM(XXxx) = 26 * 26 * 26 * 26 = 456976

RANDOM(***) = 62 * 62 * 62 = 238328

As you can see, the maths gets a bit more complicated when, but by following this recipe you can see the number of potential values.

32.2.4 Random segments and why you should avoid it

There is a hidden cost of using the random segment in TextPattern in the long run, but that does not mean you should never use it. This section will highlight the problems of using the random segment and suggest when it might be more appropriate to use it.

This section is motivated by an issue with the previous generation strategy, where you only had random generation. After while, instances using this feature would actually be unable to generate and reserve new values, since it was taking to long to find available values. This section looks at some of the problems with random generation that created this situation.

32.2.4.1 Generating random values

Before using the RANDOM segment in your TextPattern, you should consider the following problems connected to the use of RANDOM:

  • Generating values from a TextPattern with a RANDOM segment will be more complex than other TextPatterns

32.2.5 Data entry for TextPattern based metadata

As previously mentioned, the only metadata currently supporting TextPattern is the tracked entity attributes. In this section, we will describe the different ways data entry for TextPattern works, especially for tracked entity attributes.

32.2.5.1 Validation of values using TextPattern

By default, all values sent to the server for metadata using TextPattern, will be validated. Validation can be skipped if needed, but you should always validate input under normal circumstances. The validation will be based on the TextPattern you have defined and will be as strict as possible:

  • Date segments must match the same format as specified in the segment parameter

  • Plain text segments must match exactly

  • Text segments values must be at least as long as the format string. If both “^” and “$” is present, the value must match the exact length.

  • Generated segment values must match the format exactly, character by character.

When using the server to first generate and reserve values, the server will modify the values used in the TextPattern before injecting them, meaning you will always get a valid value when generating it on the server.

A final exception to TextPattern validation is made for a special case: If you change a TextPattern after reserving values for the original pattern, values sent to the server that are invalid according to the new TextPattern, will still be accepted if it was already reserved.

32.2.5.2 Different data entry flows for TextPattern

There is currently 2 ways a client can store values for TextPattern metadata:

  1. Generating and reserving values (Apps should do this for you)

  2. Storing a custom value

The preferred way, is to generate and reserve the needed values (The number of values generated and reserved is handled by the app). That means each time you are seeing and storing a value, it has been generated and reserved by the server, and will be valid.

The other way might be useful in specific cases. The user will supply the value themselves and as long as the value supplied is valid for the TextPattern, they can put anything they want. The caveat of doing it this way, is that you might use values that was reserved by someone else and if you have a SEQUENTIAL segment, the counter will not be updated.