Quality engineering is critical for all manufacturing companies to ensure their product’s quality. To do that, a method called Statistical Process Control (SPC) is applied. SPC is a methodology for monitoring a process to identify special causes of variation and signal the need to take corrective action (Book Reference). There are 7 tools of SPC, The Magnificent Seven, which consist of Histograms, Check sheets, Pareto charts, Cause-and-effect diagrams, Defect concentration diagrams, Scatter plots, and Control charts. These tools are available for most of the statistical software, a listing of such software can be found here. However, since those statistical packages are commercial software, a license fee can be costly and it is difficult to customize them to apply special rules specific to the company.
We had recently worked on implementing SPC analysis using R in SAP HANA environment. Based on the experience, I have chosen to address some common questions that I encounter in all such projects and initiatives.
- Does R cover all of the statistical calculation needs for SPC analysis? What are the advantages of using R over commercial statistical software?
- How well do R scripts get executed in SAP HANA and what is the complexity of it?
First of all, R is very well known for its statistical capabilities. So it can do 1:1 conversion for all calculations used in commercial products. It has a wide range of statistical tools including descriptive statistics (mean and standard deviation, size and range of samples), hypothesis testing (checking the significance of differences between observed mean and the target value), statistical modeling and even an SPC library called, ‘qcc’(building different types of control charts based on the subgroup size and control chart factors).
R is also capable of providing all of the visualization needs required for SPC tools such as histograms, Pareto charts, scatter plots, etc. This entire SPC process needs to be coded manually, whereas you would simply use the built-in procedures in commercial products. However, no complicated computations are used for SPC analysis, so programming the analysis to calculate control limits, long term and short term process capabilities is not a concern. The real complexity lies on pre-processing the data and post-processing the results such as scrubbing data and visualizing/formatting the analysis results to meet the specific reporting requirements set by the company. A view of the architecture and sample visualizations can be found below.
Then, what are the benefits of using R over commercial products? Suppose companies have their own rules to identify outliers, detect shifts in a process or rules to interpret process capability. R can handle them by creating additional functions, giving much more flexibility to your analysis. Also, since the analysis is performed where the data is stored (in-database analysis), it reduces the total analysis time since no data transfer is needed from database management system to the analytics platform.
Regarding the R script incorporation in SAP HANA, executing R scripts is just like running them in R-studio. The only difference is that you would wrap your R scripts with SQL languages to specify input and output data passed to the procedure. Stored procedures in SAP HANA work like a user defined function in R. For example, separate stored procedures can be created to calculate descriptive statistics, control limits, or identify outliers respectively and output from those procedures are saved to separate SAP HANA tables. Stored procedures can be called from either calculation views or from another procedure.
To summarize, using R in SAP HANA would require no learning curve for R programmers because there is no application specific language to learn or hard configurations to compile your scripts other than installing R client & required R libraries to SAP HANA. The only thing to pay attention to is that the results from procedures must match column names and data types of the output table specified before executing the procedures.
In the next article, constructing control charts and reporting SPC analysis results in SAP HANA + WEBI environment will be covered to discuss its capabilities and limitations.