5.4 Using R with MyDatamart

MyDatamart provides useful interface to the DHIS2 database by making a local copy of the database available on a users desktop. This means that the user does not need direct access to the database and the data can be worked with offline on the users local machine. In this example, we will have used the demo database . Data was downloaded at the district level for Jan 2011-Dec 201l. Consult the MyDatamart section in this manual for more detailed information.

First, lets load some required R packages. If you do not have these packages already installed in your version of R, you will need to do so before proceeding with the example.

library("DBI")
library("RSQLite")
library("lattice")
library("latticeExtra")

Next, we are going to connect to the local copy of the MyDatamart database. In this case, it was located at C:\dhis2\sl.dmart.

dbPath<-"C:\\dhis2\\sl.dmart"
drv<-dbDriver("SQLite")
db<-dbConnect(drv,dbPath)

Let suppose we have been asked to compare ANC 1, 2, 3 coverage rates for each district for 2011. We can define an SQL query to retrieve data from the MyDatamart database into an R data frame as follows.

#An SQL query which will retreive all indicators 
#at OU2 le
sql<-"SELECT * FROM pivotsource_indicator_ou2_m 
WHERE year = '2011'"
#Execute the query into a new result set
rs<-dbSendQuery(db,sql)
#Put the entire result set into a new data frame
Inds<-fetch(rs,n=-1)
#Clean up a bit
dbClearResult(rs)
dbDisconnect(db)

We used one of the pre-existing Pivot Source queries in the database to get all of the indicator values. Of course, we could have retrieved only the ANC indicators, but we did not exactly know how the data was structured, or how the columns were named, so lets take a closer look.

#Get the name of the columns
colnames(Inds)
#output not shown for brevity
levels(as.factor(Inds$indshort)) 

We see from the colnames command that there is an column called “indshort” which looks like it contains some indicator names. We can see the names using the second command. After we have determined which ones we need (ANC 1, 2, and 3), lets further subset the data so that we only have these.

#Subset the data for ANC
ANC<-Inds[grep("ANC (1|2|3) Coverage",as.factor(Inds$indshort)),]

We just used R’s grep function to retrieve all the rows and columns of the Inds data frame which matched the regular expression “ANC (1|2|3) Coverage” and put this into a new data frame called “ANC”.

By looking at the data with the str(ANC) command, we will notice that the time periods are not ordered correctly, so lets fix this before we try and create a plot of the data.

#Lets reorder the months
MonthOrder<-c('Jan','Feb','Mar','Apr',
'May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
ANC$month<-factor(ANC$month,levels=MonthOrder)

Next, we need to actually calculate the indicator value from the numerator, factor and denominator.

#Calculate the indicator value
ANC$value<-ANC$numxfactor/ANC$denominatorvalue

Finally, lets create a simple trellis plot which compares ANC 1, 2, 3 for each district by month and save it to our local working directory in a file called “District_ANC.png”.

png(filename="District_ANC.png",width=1024,height=768)
plot.new()
 xyplot(value ~ month | ou2, data=ANC, type="a", main="District ANC Comparison Sierra Leone 2011",
 groups=indshort,xlab="Month",ylab="ANC Coverage",
 scales = list(x = list(rot=90)),
 key = simpleKey(levels(factor(ANC$indshort)),
 points=FALSE,lines=TRUE,corner=c(1,1)))
 mtext(date(), side=1, line=3, outer=F, adj=0, cex=0.7)
dev.off()

The results of which are displayed below.