PROGRAMMING VECTOR AND MATRIX OPERATIONS WITH UDFS
November 27, 2006
from
11:00 AM
to
12:00 PM
Location: 232 PGH
Abstract / Event Description:
A relational DBMS provides limited functionality to build multivariate statistic models, which require
extensive vector and matrix manipulation. This talk discusses how to extend a DBMS with basic
vector and matrix operators by programming User-Defined Functions (UDFs). UDFs represent a C
API that allows the definition of scalar and aggregate functions that can be used in SQL. We explain
UDF features and limitations to implement vector and matrix operations commonly used in statistics
and machine learning, paying attention to DBMS, operating system and computer architecture constraints.
UDFs have several advantages and limitations. On one hand, a UDF allows fast evaluation
of arithmetic expressions, memory manipulation, using multidimensional arrays and exploiting all C
language control statements. But on the other hand, a UDF cannot execute disk I/O instructions,
the amount of heap and stack memory that can be allocated is small and the UDF C code must
take into account somewhat specific architecture characteristics of the DBMS. We discuss experiments
comparing UDFs and SQL with respect to performance, ease of use, flexibility and scalability. UDFs
are shown to be a good alternative to implement primitive vector and matrix operations because they
are faster than standard SQL aggregations and as efficient as plain SQL arithmetic expressions.
Created by
josten
Last modified
November 17, 2006 03:56 PM