The standard deviation outlier analysis identifies values that are numerically distant from the rest of the data, potentially indicating that they are outliers. The analysis is based on the standard normal distribution. DHIS 2 calculates the mean of all values for an organisation unit, data element, category option combination and attribute option combination. Outliers can occur by chance of course, but can potentially indicate a measurement or data entry error.
As indicated above, this data quality analysis is only appropriate for data which is actually normally distributed. Data which has large seasonal variation, or which may be distributed according to other statistical models (e.g. logistical ) may lead values being flagged which actually should be considered valid. It is therefore recommended to confirm first, whether the data actually is normally distributed before running a standard deviation outlier analysis.
Open the Data Quality app and click Std dev outlier analysis .
Select From date and To date .
Select data set(s).
Select Parent organisation unit .
All children of the organisation unit will be included. The analysis is made on raw data “under” the parent organisation unit, not on aggregated data.
Select the number of standard deviations.
This refers to the number of standard deviations the data is allowed to deviate from the mean before it is classified as an outlier.
Click Start .
The analysis process duration depends on the amount of data that is being analysed. If there are standard deviations outliers, they will be presented in a list.
For each outlier, you will see the data element, organisation unit, period, minimum value, actual value and maximum value. The minimum and maximum values refer to the border values derived from the number of standard deviations selected for the analysis.
Click the star icon to mark an outlier value for further follow-up.