5.7 Using PL/R with DHIS2

The procedural language for R is an extension to the core of PostgreSQL which allows data to be passed from the database to R, where calculations in R can be performed. The data can then be passed back to the database for further processing.. In this example, we will create a function to calculate some summary statistics which do not exist by default in SQL by using R. We will then create an SQL View in DHIS2 to display the results. The advantage of utilizing R in this context is that we do not need to write any significant amount of code to return these summary statistics, but simply utilize the built-in functions of R to do the work for us.

First, you will need to install PL/R , which is described in detail here. . Following the example from the PL/R site, we will create some custom aggregate functions as detailed here. We will create two functions, to return the median and the skewness of a range of values.

CREATE OR REPLACE FUNCTION r_median(_float8) returns float as '
  median(arg1)
' language 'plr';

CREATE AGGREGATE median (
  sfunc = plr_array_accum,
  basetype = float8,
  stype = _float8,
  finalfunc = r_median
);

CREATE OR REPLACE FUNCTION r_skewness(_float8) returns float as '
  require(e1071)
  skewness(arg1)
' language 'plr';

CREATE AGGREGATE skewness (
  sfunc = plr_array_accum,
  basetype = float8,
  stype = _float8,
  finalfunc = r_skewness
);

Next, we will define an SQL query which will be used to retrieve the two new aggregate functions (median and skewness) which will be calculated using R. In this case, we will just get a single indicator from the data mart at the district level and calculate the summary values based on the name of the district which the values belong to. This query is very specific, but could be easily adapted to your own database.

SELECT  ou.shortname,avg(dv.value),
median(dv.value),skewness(dv.value) FROM aggregatedindicatorvalue dv
INNER JOIN period p on p.periodid = dv.periodid
INNER JOIN organisationunit ou on 
dv.organisationunitid = ou.organisationunitid
WHERE dv.indicatorid = 112670
AND dv.level = 3
AND dv.periodtypeid = 3
AND p.startdate >='2009-01-01'
GROUP BY ou.shortname;

We can then save this query in the form of SQL View in DHIS2. A clipped version of the results are shown below.

In this simple example, we have shown how to use PL/R with the DHIS2 database and web interface to display some summary statistics using R to perform the calculations.