PDA

View Full Version : Database virtualization, distributed caching and streaming SQL



jhyde
08-27-2008, 05:50 PM
<a href="http://www.networkworld.com/columnists/2008/082008kobelius.html">James Kobelius writes in Network World</a> how the need for scalable real-time business intelligence will create a convergence of technologies centered on database virtualization:<br /><blockquote><span style="font-style: italic;">"Real-time is the most exciting new frontier in business intelligence, and virtualization will facilitate low-latency analytics more powerfully than traditional approaches. Database virtualization will enable real-time business intelligence through a policy-driven, latency-agile, distributed-caching memory grid that permeates an infrastructure at all levels. </span><br /><br /><span style="font-style: italic;">As this new approach takes hold, it will provide a convergence architecture for diverse approaches to real-time business intelligence, such as trickle-feed extract transform load (ETL), changed-data capture (CDC), event-stream processing and data federation. Traditionally deployed as stovepipe infrastructures, these approaches will become alternative integration patterns in a virtualized information fabric for real-time business intelligence."</span></blockquote><span style="font-style: italic;"></span> Kobelius makes it clear that this "virtualized information fabric" is an ambitious program that will be accomplished only over a number of years, but the underlying trends are visible now: for example, the convergence of distributed caches with databases, as evidenced by <a href="http://www.oracle.com/tangosol/index.html">Oracle's acquisition of Tangosol</a>, and <a href="http://code.msdn.microsoft.com/velocity">Microsoft's recently announced Project Velocity</a>.<br /><br />This envisioned system contains so many moving parts that a new paradigm will be needed to link them together. I don't think that databases are the answer. They elegantly handle stored data, but founder when dealing with change, caching, and the kind of replication problems you encounter when implementing virtualized and distributed systems. For example, database triggers are the standard way of managing change in a database, and are still clunky fifteen years after they were introduced; and <a href="http://en.wikipedia.org/wiki/Enterprise_Information_Integration">Enterprise Information Integration (EII)</a> systems were an attempt to extend the database model to handle federated data, but only work well for a proscribed set of distribution patterns.<br /><br />I <a href="http://julianhyde.blogspot.com/2008/02/streaming-sql-meets-olap.html">wrote recently</a> about how <a href="http://www.sqlstream.com/">SQLstream</a> can implement trickle-feed <a href="http://en.wikipedia.org/wiki/Extract,_transform,_load">ETL</a> and use the knowledge it gleans from the passing data to proactively manage the <a href="http://mondrian.pentaho.org/">mondrian OLAP engine</a>'s cache. SQLstream also has adapters to implement <a href="http://en.wikipedia.org/wiki/Change_data_capture">change-data capture (CDC)</a> and to manage data federation.<br /><br />In SQLstream, the <span style="font-style: italic;">lingua franca</span> for all of these integration patterns is SQL, whereas ironically, if you tried to achieve these things in Oracle or Microsoft SQL Server, you would end up writing procedural code: PL/SQL or Transact SQL. Therefore streaming SQL - a variant of what Kobelius calls event-stream processing where, crucially, the language for event-processing language is SQL - seems the best candidate for that unifying paradigm.

More... (http://julianhyde.blogspot.com/2008/08/database-virtualization-distributed.html)