These notes were taken in a meeting on Feb 18, 2004 between
JonBlower,
KeithHaines, Mike Turnill (Oracle) and Dave Pearson (Oracle). Dave was the manager of OGSA-DAI (Phase 1).
Summary
We discussed the features of the new Oracle 10g database and the Spatial extensions. This looks like a promising technology for handling our data. The 10g database is designed for clustered systems, i.e. basic (perhaps diskless) commodity computers, all sharing a common data store (which, again, might be spread over several storage units). The Spatial extensions contain powerful features for querying and retrieving spatial data, which could be in raster or vector format.
Oracle 10g technical notes (from talk by Dave Pearson)
Oracle can handle large amounts of data and colocation of data and processing. Since 9i, Oracle got into cluster computing - running the database on clusters of low-cost commodity hardware (a
Blade is a stripped-down PC). This is very scalable - can easily add or remove machines. Need Gigabyte connections between nodes in the cluster.
In Oracle 10g, one can define the service level required, and resources can be dynamically allocated. PCs don't have attached storage; data is shared between all servers, so there is a separate data and a processing cluster. This is unlike a Beowulf cluster; in Beowulf, if one node happens to hold 90% of the data, this causes a performance bottleneck.
Databases can be synchronised across sites and queried as if all the data was at one site. The synchronisation uses a streaming protocol, currently proprietary to Oracle, but they are hoping to work with IBM to standardise this.
They have done benchmarks with 160-billion row tables! France Telecom has a database of ~70TB in size, in theory it could go to 27PB.
The database contains a full XML implementation; this can be queried using XPath or XQuery and relational views can be created over the XML data. Similarly, XML views can be created over relational data.
There are also data mining tools to do statistical analyses and correlations.
There is probably little performance increase over simply reading flat files in some circumstances. However, spatial tables are partitioned (aka 'tiling') which should speed things up, especially for multiple users.
Custom functions can be written - either in Java or PL/SQL, or can write C functions which become part of the database (SQL extensions).
Oracle Spatial (from talk by Mike Turnill)
Provides infrastructure for people to exploit spatial data. This can be vector data (roads etc) or, since 10g, raster data (images, model data). Oracle 10g can handle many data types: spatial, object-relational, documentation, multimedia, messages... and deploy them to anyone. Can relate associated information (images, maps, docs etc) to spatial locations.
Historically, GIS and IT have been separate, and GIS has lots of proprietary formats (from Intergraph,
MapInfo , Autodesk, ESRI): Oracle replaces these formats. Oracle have a commitment to standards (
OpenGIS ,
OpenLS , SQL, LIF)
Oracle Spatial has a
native spatial data type and associated indexer, accessed through SQL. This is unique to Oracle - other DB vendors added bolt-ons to an existing database; Oracle built from the ground up. Data are stored natively with 3 spatial dimensions. Native functions (for working out, for example, whether a point lies within a polygon) only work in 2D (but can pick any 2 of the 3 dimensions). Development started in 1994 and version 7.3.3 was first product with these functions implemented properly. You can ask questions of the database like "get all hospitals within 5 miles of a certain point".
Supports many vector data types: points, lines, polygon (with holes), circles, arcs, rectangles, collections
(I think this bit is about raster data:) Data can be indexed by linear quadtree (OK if data doesn't change much) or R-tree (slower on joins, but much more flexible: don't have to decide a priori how to tile the data)
Managed by Oracle Enterprise Manager (handles spatial and non-spatial data)
Spatial operators supported: Inside, Contains, Touches, Disjoint, Covers, Covered By, Equal, Overlap Boundary (more?) Also distance operators (distance from line to point, etc). These are all 2-D operations but Oracle would be interested in extending to 3-D. Also Union, Difference, Intersect, XOR,
ConvexHull (are these raster operations?)
The Oracle Application Server has
MapViewer component. This is pure Java so can be embedded in a web page. (Thought: could we create a GIS fairly easily with this plus Spatial?)
Other Spatial features: Spatial Partitioning, different coordinate systems and projections (geodetic, Cartesian)
New in Oracle 10g (which was released roughly a week ago). Network Manager (can describe graphs with nodes and links),
GeoRaster (see white paper,
JonBlower has a hardcopy).
GeoRaster can generate a resolution pyramid (so can download data by subsetting every 5th data point, for example).
Can do "Continuous Queries". This is like a subscription to a certain topic. You can be asked to be notified when, for example, a certain value goes out of range.
Versions of Spatial
There are two versions of Spatial: Oracle Locator is the reduced version which ships included with Oracle Standard and Enterprise Editions. Spatial is a priced add-on to Enterprise.
Obtaining Spatial
Apparently, if you register on Oracle Technology Network, you can download the whole of 10g and Spatial for evaluation purposes.
Other info
There is a Spatial SIG, run by the UK Oracle User Group and they hold meetings.
Support for standard file formats such as netCDF, HDF etc was not clear (Mike thought that CDF was supported). Would be useful if Spatial could import and export data in these formats.
Spatial and e-Science
So far (Feb 2004), no e-Science projects are using Spatial to serve data. The
ReSC will do some tests on Spatial and report back our experiences.
--
JonBlower - 19 Feb 2004