Take a look at why business users are turning to open source business intelligence platforms as an alternative to costly proprietary applications.
Open source business applications have started to mature into robust platforms, serving sales, finance and operational needs. Now, open source business intelligence (OSBI) platforms are also gaining attention, as owners of proprietary BI applications are navigating market consolidation, product roadmap changes and ever-increasing licensing costs.
OSBI platforms are typically marketed as commercial open source software (COSS), similar to the model popularized by Red Hat. COSS companies generate revenue from support, subscriptions and training. Most CIOs feel it’s critical for their applications to have an identified commercial entity standing behind business infrastructure rather than relying on the promise of community alone.
A good OSBI platform has at least three components: a database layer for organizing business data, a business intelligence layer to transform and present information and an analytics/performance management layer to predict business outcomes and opportunities.
Figure 1: A business intelligence platform includes three layers: a database layer, a business intelligence layer, and an analytics/performance management layer.
The Database Layer
Two of the more popular open source databases for business intelligence are MySQL and PostgreSQL. The strong capabilities of these core databases have spun off new commercial offerings like Kickfire and Infobright, both based on MySQL, and EnterpriseDB, Vertica and Greenplum, all based on PostgreSQL.
MySQL supports features found in other popular enterprise databases, including partitions, triggers, stored procedures and updateable views. MySQL supports â€œswappableâ€ storage engines, some for transaction processing, others for rapid querying. And Sun Microsystems’ investment in MySQL should give it more muscle in enterprise settings.
PostgreSQL is another popular choice for OSBI. PostgreSQL is a full-featured database that includes many constructs that are common on proprietary commercial applications, including tablespaces, temporary tables, inheritance, functions, sequences, triggers and views.
Some performance tuning mainstays of BI like bitmap indexes and materialized views aren’t yet available in MySQL or PostgreSQL. But in most cases, these databases are great open source alternatives and have the core features for BI.
The Business Intelligence Layer
The core of any BI project centers around organizing data for business analysis and then presenting the information in static or dynamic reports. Dmitri Popov’s excellent article on OpenOffice’s Report Builder in the April 2008 issue of Linux Magazine showed OpenOffice’s useful desktop reporting tool. Actuate BIRT is another kind of reporting tool that offers some additional reporting features. And platforms like JasperSoft and Pentaho offer breadth from data organization through presentation.
The Pentaho BI Suite is a platform that includes Pentaho Data Integration (Kettle) for data Extract, Transform and Load (ETL), Pentaho Reporting and Pentaho Analysis (Mondrian) for OLAP. The BI server comes with a starter application for executing reports and OLAP cubes and administering server settings and components.
Pentaho includes a unique concept in their suite. Their Design Studio includes actions (basically a job flow) to execute tasks on the BI server linking reports and ETL processes in unique ways. For example, an ETL routine can deliver data to a report.
The Pentaho Data Integration (PDI) application organizes and prepares data for analysis. PDI includes a drag-and-drop palette to create job flows. Jobs can, for example, execute transformations, database SQL scripts, bulk database loaders, FTP and send mail. PDI has many common input, output and processing steps, including mapping, sorting, merging and grouping. PDI also includes unique steps to build data marts with dimension lookup/update, combo lookups and row normalization and denormalization steps.
Figure 2: Pentaho Data Integration prepares data for analysis and other needs.
For presenting data, Pentaho’s Report Designer provides a banded report designer along with a sophisticated ReportWizard. The wizard does the basics as well as providing support for field sizing, column headers, row banding and group settings. Unique to Report Designer is the ability to source reports from Pentaho’s metadata layer which lets you build a â€œbusiness layerâ€ on top of complex databases to shield users from underlying data complexity.
In addition to Report Designer, another Pentaho-provided reporting tool is Pentaho Analysis, built on the Mondrian engine. Pentaho Analysis lets users drill up and down hierarchies of data to understand what’s behind a number on a report. Developers first map out â€œcubesâ€ of data with the Schema Workbench. After publishing a schema to the Pentaho server, users can access detail by drilling up and down the report. The user interface lacks modern drag and drop features but the basic interface works well.
JasperSoft is moving aggressively to provide a full-scale OSBI suite, incorporating ETL, OLAP and report design under a common BI platform. The JasperSoft BI suite includes JasperServer at its core, JasperETL for data transformations, JasperAnalysis for OLAP analysis, and JasperStudio (also known as iReport) for report design of JasperReports.
To prepare data within the Jasper platform, JasperETL uses Talend for its core application. JasperETL offers a nice drag and drop palette for ETL job development. The user interface is based on Eclipse, so it’s more like a Java development tool than an ETL application, though some developers may like this. JasperETL has all of the basic transformations, and includes special steps for basic data quality and fuzzy matching routines. The data mapping interface in JasperETL is particularly strong. JasperETL supports most popular output database or file formats, and JasperReports can even be output directly from the job flow.
To present data, JasperReports are created in JasperStudio’s banded report designer. Most developers will probably start with the built-in Report Wizard, which provides the basic report creation steps. Advanced users will find themselves migrating to the robust JasperStudio interface, which has a variety of formatting tools to make time-consuming sizing, organizing and aligning of report objects easier.
JasperReports also provides several unique reporting properties like field stretching, repeating values and print when conditions. It also supports crosstabs. JasperStudio tools like the Report Group Wizard and drag and drop aggregations are big productivity enhancers.
Like with Pentaho, JasperAnalysis’s OLAP is powered by the Mondrian engine. While there are slightly different interfaces for cube administration, overall, JasperAnalysis isn’t much different from Pentaho’s implementation of Mondrian.
Figure 3: JasperAnalysis provides OLAP capabilities through Mondrian and JPivot.
The Business Intelligence Reporting Tool (BIRT) focuses purely on Reporting, with no ETL or OLAP. BIRT can be found as Eclipse BIRT, the freely-available tool as part of the Eclipse project, or as the commercialized Actuate BIRT.
BIRT gets users started on report design with â€œcheat sheetsâ€ that give a step-by-step tutorial for reporting basics. The designer in BIRT is atypical in that it uses a table metaphor instead of the more traditional banded report design. If a particular report isn’t a grid, it can be hard to get fine control over the positioning of fields and headings.
BIRT has most of the common report features including support for cross tabs .But it’s missing others, like built-in banded reporting and the â€œno dataâ€ band, which are found in other packages. Certainly these processes can be done with scripting, but it doesn’t come out of the box.
In the end, both Pentaho and JasperSoft offer suites that include the core BI suite components, while BIRT is a competitive reporting tool with many strong features. Each technology may appeal to a particular business based on its priorities, though analytical reports produced from any will hold up to discerning business users.
The Analytics/Performance Management Layer
Often left out of traditional business intelligence, statistics and analytics can be a boon to businesses. Advanced users use statistics and analytics to classify data and predict outcomes which can improve business performance. Fortunately, the open source alternatives to expensive proprietary offerings are surprisingly strong.
Pentaho offers analytics through Weka, an open source project from the University of Waikato, New Zealand. Weka provides a drag and drop interface to develop predictive models and scoring, including statistical clusters, tree based analysis, regressions and bayesian classifiers. So far, the integration of Weka into the rest of Pentaho’s platform is limited to a plug-in in PDI for sampling and scoring.
R is the most popular open source statistical package, being the de facto standard for many universities and researchers. One factor that makes R so popular is the volume of freely-available add-on packages available on the Comprehensive R Archive Network. A recent check showed over 1,400 packages for virtually every statistical technique. The staggering volume of possibilities can be daunting but few will find R lacking in capability.
For those who like drag and drop graphical applications for statistical analyses, Rapid Miner (formerly YALE) is a great solution. Rapid Miner includes an analysis wizard to get started including templates of several common analyses. Rapid Miner also comes with over 150 sample â€œexperimentsâ€ to show how to build process flows with tasks such as getting database data, running models, and producing impressive charts.
Figure 4: Robust graphics in Rapid Miner help analysts identify correlations and trends.
While expensive proprietary options have a legacy of development, new open source BI packages offer a competitive set. They include the most popular features on a Java-based platform for better interoperability across the BI platform and across the business. As OSBI matures, more user interfaces will move from Java development tool plug-ins to more specific BI-developer applications.
The business features of OSBI applications are also certain to improve with better data quality applications, improved business metadata and more interactive user interfaces. And they’ll continue to improve in their ability to scale up to large, multi-TB databases for broad enterprise use.
Those who have seen new versions of their BI applications stagnate, or are looking at spending another big chunk of their IT budget on license fees, should certainly take a look at the open source alternatives. They’ll probably be pleased at what’s going on in Open Source Business Intelligence.