back to intro

LitSearch: MySQL Port

Why MySQL?
The accounts we were given on the Oracle server had a 50MB quota. Unfortunately, the database I wanted to load consisted of about 100 works, at 2 to 5 MB each. Factoring in indicies and such, the total disk space required to store this data was well over 1 GB. Figuring that the CS department might balk at a request for so much space for a class project, I decided to run the project on my own box. And not wanting to pay for an Oracle license, I decided to run an open-source alternative, MySQL and port my SQL code to work under MySQL.

I originally thought that the task of porting to MySQL would be relatively simple. Unfortunately, MySQL does not support some essential functionality. Specifically related to this project, MySQL doesn't support subqueries, views, stored procedures, triggers, and foreign keys. Some workarounds are discussed in MySQL's list of missing functionality. Additionally, I was unable to find a way to hint the MySQL planner, so certain operations that should have used indicies apparently were not doing so, causing severe performance problems. As a result, I was forced to change the database schema slightly, and the MySQL schema I used for creating the web application was as follows:

CREATE TABLE Author (
name        varchar(128) NOT NULL PRIMARY KEY,
birth       integer NOT NULL,
death       integer NOT NULL,
description text
);

CREATE TABLE AuthorCriticized (
url         varchar(255) NOT NULL,
name        varchar(128) NOT NULL,
PRIMARY KEY (url, name)
);

CREATE TABLE Criticism (
title       varchar(255),
critic      varchar(128),
url         varchar(255) NOT NULL PRIMARY KEY,
synopsis    text,
publication varchar(255),
keywords    varchar(255)
);

CREATE TABLE PageContainsWord (
word        mediumint NOT NULL,
page        mediumint NOT NULL,
PRIMARY KEY (word, page)
);

CREATE TABLE Period (
region      varchar(64) NOT NULL,
era         varchar(32) NOT NULL,
PRIMARY KEY (region, era)
);

CREATE TABLE PeriodLived (
region      varchar(64) NOT NULL,
era         varchar(32) NOT NULL,
name        varchar(128) NOT NULL,
PRIMARY KEY (region, era, name)
);

CREATE TABLE PeriodWritten (
region      varchar(64) NOT NULL,
era         varchar(32) NOT NULL,
work        integer NOT NULL,
PRIMARY KEY (region, era, work)
);


CREATE TABLE Syn (
root        integer NOT NULL,
child       integer NOT NULL,
PRIMARY KEY (root, child)
);

CREATE TABLE UniquePage (
id           mediumint NOT NULL PRIMARY KEY,
work         smallint NOT NULL,
page         smallint NOT NULL,
global_start integer NOT NULL,
global_end   integer NOT NULL
);

CREATE UNIQUE INDEX uniquepage_idx1 ON UniquePage(global_start);

CREATE TABLE Word (
id          integer NOT NULL PRIMARY KEY,
word        varchar(32) NOT NULL
);

CREATE INDEX word_idx1 ON Word(word);

CREATE TABLE WordDetails (
gpos        integer NOT NULL PRIMARY KEY,
context     char(24)
);

CREATE TABLE WordInstance (
word        mediumint NOT NULL,
gpos        integer NOT NULL,
prev        mediumint NOT NULL,
PRIMARY KEY (word, gpos)
);

CREATE TABLE Work (
id           integer NOT NULL PRIMARY KEY,
title        varchar(255),
year         integer,
length       integer,
gutenberg_id varchar(20),
include      integer
);

CREATE TABLE WorkCriticized (
url          varchar(255) NOT NULL,
work         integer NOT NULL,
PRIMARY KEY (url, work)
);

CREATE TABLE WrittenBy (
work         integer NOT NULL,
name         varchar(128) NOT NULL,
PRIMARY KEY (work, name)
);

In retrospect, I probably should have used PostgreSQL, which is also open-source, and supports subqueries and most of the functionality that Oracle supports. But I didn't know about it at the time, so I ended up doing the MySQL port.


Author: Keith Ito - September 2001