Delphi Magazine-Database Design Primer.pdf

(88 KB) Pobierz

Surviving Client/Server:

Database Design Primer

by Steve Troxell

esigning tables for a client/

server database can present

new challenges to traditional file

server oriented developers. There

are many more features and issues

in the database server world that

the developer should to be aware

of. Nearly any reference book will

tell you what the additional fea-

tures are, but you also need to now

how to employ them effectively.

I get some of my best pearls of

wisdom from pop culture. About

ten years ago I saw a comedy skit

that pretty much summed up my

opinion of a lot of people in soft-

ware development. The skit in-

volved a quasi-prehistoric tribe of

people out on a hunt, trying to for-

mulate a plan for felling some ani-

mal, a buffalo let’s say. The leader

suggested a head-on assault and

commented that 3 or 4 of their

party would probably be killed, but

they would have food for the tribe

for a whole month. A younger mem-

ber of the hunting party came up

with an idea. “What if we all spread

out and waved our arms and

shouted and drove the animal over

that cliff over there? Then we can

simply walk down to the base of the

cliff and drag the carcass back to

camp.” The leader of the hunting

party thought for a while and said,

“A very interesting plan. But your

idea is new and we fear new things.

Therefore it must be rejected.”

People entering client/server

development are faced with a vari-

ety of new capabilities as well. It

surprises me to find that some de-

velopers still shy away from such

things apparently simply because

they are unfamiliar with them. The

following is meant to shed some

light on some of the issues facing a

client/server developer.

Open Architecture

The most important issue you

must come to grips with as early as

possible is whether your system

will be an open or closed architec-

ture. Nearly all client/server RDBM

systems are designed to allow an

open architecture, but that doesn’t

mean you must have one. What I

mean by open architecture is the

database can be accessed not only

by the proprietary applications

you develop, but also by off the

shelf third party productivity tools

such as spreadsheets, data extrac-

tors, word processors and report

generators. This is generally

achieved by using a database

driver such as ODBC to connect the

application to the database server.

Third-party applications can be

written to conform to the standard

ODBC interface and the ODBC

driver in turn is written to access a

particular database server’s API.

In this way the same application

can access different vendor’s data-

bases by substituting an ODBC

driver appropriate for that

database.

Many file server databases, such

as Paradox or dBase, provide for

open architecture as well, so this

isn’t the exclusive domain of cli-

ent/server. However, most devel-

opers are accustomed to closed

architectures where they are not

concerned about anything but

strict proprietary application ac-

cess to the database. It’s important

that developers and customers

both understand the role of third

party software in the system as

early as possible, because many

aspects of the database design can

be an aid or hindrance to an open

architecture.

If the extent of your system’s

need to provide external access to

third-party programs is simply

allowing your users to run prefab-

ricated reports through a commer-

cial report writer, then you really

don’t have much of an open archi-

tecture to worry about. While

The Delphi Magazine

you’ll still need to consider some

issues of user permissions to ta-

bles and other objects, your users

have no need to know what the

tables are, their layout, what the

fields are, the relationships be-

tween tables, etc. But for a mere

technicality, this is still essentially

a closed system. But if your users

will be designing their own ad hoc

reports or queries, then you’ll need

to consider an open architecture.

The real value of an open archi-

tecture is allowing at least some

cross section of the user base di-

rect access to at least some of the

data beyond a prefabricated frame-

work. Generally, this access is pro-

vided in the form of ad hoc

querying for data analysis and re-

porting. This is a very valuable

quality for a database to have as it

opens up selected bits of data to

more users so they can get what

they need and how they need it in

a timely manner. This means you

may make different database de-

sign choices than you would if the

design was the sole domain of

software developers.

User Security

Open architectures are the princi-

pal reason why all client/server da-

tabases provide user login

accounts. A typical closed data-

base system might not be con-

cerned with exactly who was doing

what in the application. Or if it was,

it might employ a simple

Users

ta-

ble within itself. Some true cli-

ent/server applications that are

unconcerned with user account-

ability and external access to data

simply create a single fixed ac-

count in the database which is

used by all applications to gain ac-

cess to the data. The account user-

name and password are encoded

directly into the applications and

there is no login dialog for the user

to complete.

June 1997

However, user accounts in

client/server databases are not

primarily concerned with tracking

who’s connected and who’s not.

Their chief purpose is in control-

ling who has access to which parts

of the database. Remember, third

party tools connect to the data-

base independently, and the whole

database is laid bare for all to see.

Would you want marketing people,

whilst creating ad hoc reports on

product sales, to see the payroll

data for the whole company? On

the other hand, you do want the

accounting personnel to be able to

get to the payroll data? When all

access is running through proprie-

tary applications, you can easily

control who gets to what. But with

an open architecture, you have to

rely on the built-in security avail-

able via the user accounts in the

RDBMS you’re using.

This means granting or revoking

permissions to individual users to

read, insert, update or delete data

(or any combination) for a given

table, as shown in Listing 1. So, no

permissions would be granted on

payroll tables to any user that

wasn’t in the accounting depart-

ment (see Listing 2). Many RDBM

systems let you create groups of

users in just this manner to make it

easier to manage user permissions.

Actionable Information

Database design involves striking a

balance between performance,

functionality, and size. With an

open architecture there is also the

need to optimize the amount of “ac-

tionable information” within the

data. This means data that carries

meaning in and of itself, minimizing

the requirement to decode, trans-

late or interpret the data. Software

developers are accustomed to de-

coding, translating and interpret-

ing data and design their databases

accordingly, to achieve compact-

ness and because there’s been no

compelling reason not to. With an

open architecture you have to keep

in mind that users other than soft-

ware developers will be working

with the data.

Take a look at the data shown in

Figure 1. Which rows are for open

orders? Which rows are for orders

on hold? Many database develop-

ers don’t think twice about using a

numeric sequence to encode val-

ues for a column. But the values

carry no meaning in and of them-

selves. You are forced to make a

translation from a completely ab-

stract value by referring to a data

encoding sheet or joining to a

lookup table.

Now look at Figure 2. The same

data has been encoded differently.

Sure, you can say a mental transla-

tion of the code value

to the term

“open order” is still required, but

the value is now a mnemonic rather

than an abstract value. It carries

meaning by itself.

To get the same level of meaning

in a result set from the abstract

data shown in Figure 1 would

require a join to a lookup table.

Several columns employing ab-

stract encoding could lead to

several joined tables and you can

see how quickly ad hoc querying

can become complicated and

inefficient.

Listing 1

Declarative Constraints

If your open architecture allows

data modifications from third-

party programs, then you’ll defi-

nitely want to consider using some

form of automatic processing to

perform data validation and ensure

data integrity. With client/server

databases, you have two choices:

declarative constraints or triggers.

Declarative constraints are defined

at the time you create the table.

They include nullability, defaults,

uniqueness and check constraints.

Listing 3 shows a few examples.

The exact syntax varies from ven-

dor to vendor so be sure to check

your manual before trying this with

your database server.

Any field may contain a null

value. A null indicates the actual

value is not available or not known.

For example, if an employee does

not wish to divulge their birthday,

we may store a null in the

DateOf-

Birth

field. Nulls are available to all

datatypes and save the developer

from having to contrive a special

/* Give John read-only access to the employees table */

GRANT SELECT ON Employees TO JohnA

Listing 2

/*Deny all access to the payroll table */

REVOKE ALL ON Payroll FROM Public

GRANT ALL ON Payroll TO Accounting

Figure 1

OrderNum CustNo

871182

871183

871184

871185

113

291

195

Date

6/2/97

Status

Clerk

Total

$507.92

$122.45

$1,209.00

$45.22

Figure 2

OrderNum CustNo

871182

871183

871184

871185

113

291

195

Date

6/2/97

Status

Clerk

Total

$507.92

$122.45

$1,209.00

$45.22

The Delphi Magazine

Issue 22

code value to serve the same pur-

pose. However, nulls introduce

some complexities to query logic.

You can disallow nulls in any col-

umn by using the

NOT NULL

con-

straint. We will explore nulls more

closely later in the article.

When a column value is omitted

from an SQL

INSERT

statement, then

the values for that column in the

rows being inserted will be null by

default unless you supply your own

using the

DEFAULT

constraint. If

you’ve declared a column

NOT NULL

then you should supply a default

value unless there is no logical de-

fault (like the

DeptNo

field in the

example). Defaults can be literal

values or functions built into the

database. The

HireDate

field in this

example uses the hypothetical

built in function

CurrentDate

post today’s date as a default value.

Some columns within a table re-

quire non duplicating values, like

employee badge numbers. We can

enforce this rule by using the

UNIQUE

constraint, which simply

places a unique index on the col-

umn. Databases vary in how they

handle null values on unique col-

umns. Some require that there be

no more than one row with a null

value in the column (strict inter-

pretation of uniqueness). Others

Listing 3

CREATE TABLE Employees(

EmpNo

integer

FirstName

varchar(15)

LastName

varchar(20)

DeptNo

char(3)

PhoneExt

char(3)

HireDate

date

DateOfBirth date

SSN

char(11)

Salary

double

allow any number of rows to have

a null value in the unique column,

following the rule that no null value

can be equal to any other null value

(strict interpretation of null).

In the case of our badge number

column, our business rule is that

not every employee is required to

have a badge number, but those

who do cannot have duplicate

badge numbers. The only way we

can enforce this with declarative

constraints is to define the column

unique and allow any number of

null values within the unique col-

umn. If we chose to use a special

non-null code value for employees

without badges, we could not use

the unique constraint.

Check constraints are the most

flexible declarative constraints.

Check constraints allow you to de-

fine an expression that must evalu-

ate to true. In Listing 3 we’ve used

check constraints to ensure that

the

Salary

field is a positive num-

ber and that the

BadgeNo

field is a

null or a string of one uppercase

letter followed by two digits.

Nearly all declarative con-

straints can be given a name, like

our check constraint for the

Salary

field. When a constraint is violated

by an insert or update operation,

the server raises an error and the

name of the constraint being vio-

lated should be present in the error

message text. In this way, client

apps can respond to particular er-

rors, parse the error text for con-

straint names, and provide a

cleaner, more meaningful error

message to the user.

Triggers

Declarative constraints only go so

far. For example, how do we en-

force the rule that employees in

departments 012, 014 and 155 re-

quire badges? For more powerful

automatic data processing, you

have to turn to triggers. Triggers

contain static SQL code that fires

whenever a data modification op-

eration is performed, regardless of

the application that initiated the

operation. Within a trigger you can

generally use more complex logic

than would be available in a decla-

rative constraint. Separate triggers

can be placed specifically for in-

sert, update or delete operations

on a given table. Some databases

allow many triggers to exist for the

same operation on the table, each

firing in sequence.

Listing 4 shows a trigger we

might use to enforce the

BadgeNo

rule. Again, exact syntax varies

from database to database. Trig-

gers provide a mechanism for ex-

amining the values of the row being

affected. In this case, the

New

vari-

able gives us access to the values

being written to the table.

A common use of triggers is auto-

matic stamping of username, date

or time information in sensitive ta-

bles. Also, cascading deletes can

be implemented with triggers,

where deleting a record in a master

table automatically deletes associ-

ated records in one or more detail

tables (see Listing 5).

You’ll want to be judicious in

your use of triggers and con-

straints since there is a bit of over-

head in their execution, and they

will always execute for every data

modification they are associated

with.

Domains

Domains are essentially user-

defined datatypes in SQL. Domains

are very helpful for ensuring the

BadgeNo

char(3)

NOT NULL,

NOT NULL DEFAULT ’’,

NOT NULL,

NOT NULL DEFAULT ’’,

NOT NULL DEFAULT CurrentDate,

NULL,

NOT NULL DEFAULT ’<unknown>’,

NOT NULL DEFAULT 0

CONSTRAINT SalaryPositive

CHECK (Salary >= 0),

NULL UNIQUE

CHECK (BadgeNo IS NULL or

BadgeNo LIKE ’[A-Z][0-9][0-9]’)

);

Listing 4

CREATE TRIGGER BadgeRule ON Employees

FOR INSERT, UPDATE

BEGIN

IF (New.DeptNo IN (’012’, ’014’, ’155’)) AND

(New.BadgeNo IS NULL) THEN

EXCEPTION BadgeNoReqd;

END

June 1997

The Delphi Magazine

consistency of data declarations

across tables in a non-trivial data-

base. For example, people’s names

might appear in several different

tables: customer contacts, sales-

people, employees, vendor con-

tacts, etc. If a design decision has

been made that all name fields al-

low up to 30 characters, then it is

much simpler to define a

TPerson-

Name

domain as

VARCHAR(30)

and

make all your table column defini-

tions with the domain name rather

than the direct data type as shown

in Listing 6.

Domains can be valuable for

keeping consistent definitions and

making it much easier to change

definitions across the entire data-

base. A frequently overlooked

value of domains is their use in

stored

procedure

parameter

datatypes. While there may be only

a few fields in table definitions

sharing the same domain, there

may be many different stored pro-

cedures with parameters of that

same data type. Once again, ensur-

ing the consistency of the data type

definitions within stored proce-

dure parameters and their associ-

ated table columns is much easier

with a domain name.

Normalization

Normalization generally refers to

eliminating redundancies in tables

by splitting them apart into smaller

tables with a relationship defined

between them. Typically several

narrow (fewer columns) tables are

better than one wide (lots of col-

umns) table. The narrow tables

usually result in more compact

data because omissions and vari-

able quantities can be handled

more efficiently. Also, a narrow ta-

ble means more rows are packed

into a data page and the server can

scan more rows per I/O operation,

improving performance. There are

five main rules of normalization. In

practice the first three are usually

sufficient for most systems.

First Normal Form:

Atomic Values

The point where a column and row

intersect in a table we’ll call a cell.

First normal form requires that

each cell in a table contain an

atomic value. That is, a single value

cannot be further broken down

into useful values. For example, for

a table of mailing addresses, it is

not uncommon for designers to

group the elements city, state and

postal code into a single field of the

form

<city>, <state> <postal

code>

since that is how they need

to displayed when printed on a

mailing label. However, this single

value can be broken down into

three useful elements and should

be stored in three separate col-

umns. The values in the three

columns can easily be combined to

produce the desired format for the

mailing label.

Having the values separated out

permits you to manipulate the data

at a finer level. For example, most

bulk mailings can gain a postage

discount if the addresses are

sorted by postal code. How are you

going to sort the records when all

three elements are pushed to-

gether in a single field. Also you

may want to stagger mailings to

different states. You will need to

select all addresses for state A for

the first mailing and all addresses

for state B for the second mailing.

It would be extremely difficult to

filter records on state if it’s not

broken out into its own field.

You shouldn’t get carried away

with atomizing values though. For

example, given a phone number

with area code, you could argue

that the area code must be sepa-

rated from the actual phone num-

ber to be in first normal form.

Generally it is not necessary to go

to this extreme if the phone num-

ber and area code will always be

Listing 5

used together, which is nearly al-

ways the case. You should only be

concerned about this if your sys-

tem has some functional need to

isolate the area code, for example

a requirement to produce a list of

contacts sorted by area code.

This reasoning also applies to

people’s names. Many designers

create a single field for a name,

such as

Steve Troxell

. If there is

absolutely no requirement to oper-

ate on the individual elements of

the name, this may be fine. For

example, a contact name for a

customer or vendor record.

However, there usually is much

value in splitting name fields into

first name and last name compo-

nents that may not be immediately

obvious. Is it important to you that

the appearance of the name be con-

sistent? If you have a single name

field how are you going to prevent

problems like user A always enters

Steve Troxell

but user B always

enters

Troxell, Steve

. If you sort

by the name are you making an

assumption that all data entry will

lastname, firstname

? If there is

the possibility that these issues

may arise in your system, it’s bet-

ter to split the name field apart; you

can always assemble the complete

name in any form you need.

First Normal Form:

No Repeating Groups

An extension of the requirement of

atomic values is that there be no

repeating groups of information.

That is, a single column cannot be

an array, list, or multi-field struc-

ture. Most RDBM systems enforce

this by simply not providing

CREATE TRIGGER MasterCascade ON MasterTable

FOR DELETE

AS BEGIN

DELETE FROM DetailTable WHERE KeyNo = Old.KeyNo;

END

Listing 6

CREATE DOMAIN TPersonName AS varchar(30);

CREATE TABLE Customers(

CustNo

integer;

Company

varchar(30);

Contact

TPersonName);

The Delphi Magazine

Issue 22

non-scalar datatypes. However, de-

velopers still try to buck the sys-

tem by designing tables like that

shown in Figure 3 .

This is in first normal form by

technicality only. Each field is an

atomic value. But the layout still

breaks the spirit of first normal

form and should be redesigned.

The

AuthorID

fields presumably

link to a separate

Authors

table con-

taining each author’s name and

other info. This approach intro-

duces several problems. First, it

imposes an artificial business rule

that there can be no more than

Figure 3: Books table

Title

Delphi

How-To

Delphi

Unleased

ISBN

1-57169-019-0

0-672-30499-6

PublisherID

SAMS

WAITE

three authors for any given book.

So do we simply refuse to accept a

book with four or more authors?

Most likely, the book will get en-

tered but only the first three

authors will be credited. Second, if

I want to see a list of books by any

given author, I must query all three

AuthorId

fields to guarantee cover-

age of all possible positions of the

value I’m looking for. This can lead

to some hideous code workar-

ounds to effectively implement a

filter on author.

A solution to these problems is

shown in Figure 4. The

BookAuthors

table contains as many rows for a

book as there are authors for that

book. You can see how it would be

much simpler to search for all oc-

currences of a single author in the

BookAuthors

table and then link in

the book title from

Books

. Tables

such as this go by one of several

names: relationship, association or

junction tables. They describe a

relationship between two other

tables and don’t necessarily con-

tain any useful independent data

themselves.

Second Normal Form

Second normal form means that if

a table’s primary key is composed

of more than one column then no

column in the table should be de-

pendent on just part of the primary

key. This is easier to see with an

example. Figure 5 shows a project

table assigning employees to pro-

jects. The primary key for the table

is both the project identifier and

the employee number. As you can

see the employee’s name is only

dependent on part of the key for

the row (employee number) and

should be eliminated from this ta-

ble. The

BillRate

column is the

hourly rate that employee charges

for this project. An employee work-

ing on multiple projects might

charge different rates for each pro-

ject. Therefore, the

BillRate

col-

umn is dependent on the entire

primary key and is appropriate for

this table.

The problem with a table that

fails to meet second normal form is

that if we change a single fact about

the database (in this case, an

employee’s name), then we must

change that fact in several different

places (in this case, every

occurrence in the

ProjEmp

table).

Third Normal Form

Third normal form extends this de-

pendency logic by stating that all

non-key columns must depend on

the primary key and not depend

solely on another non-key column.

In Figure 6 it is sufficient to stop

with the

DeptNo

column as

DeptName

and

DeptManager

are not directly

relevant to the employee, but in-

stead should be broken out into a

separate table keyed by

DeptNo

AuthorID1

AuthorID2

<null>

AuthorID3

<null>

Figure 4

Books

Title

Delphi How-To

Delphi Unleased

ISBN

1-57169-019-0

0-672-30499-6

PublisherID

SAMS

WAITE

BookAuthors

ISBN

0-672-30499-6

1-57169-019-0

AuthorID

Figure 5

ProjectID

X1104

X1256

J3880

EmpID

102

EmpName

Bob Smith

Stan Stevens

Bob Smith

Roy Rogers

BillRate

65.00

35.00

90.00

75.00

Figure 6

EmpNo

112

115

121

EmpName

Sue Bennet

Roy Robinson

John James

DeptNo

DeptName

Shipping

Payroll

Shipping

DeptManager

Bob Smith

Carly Johnson

Bob Smith

June 1997

The Delphi Magazine

Plik z chomika:

xlo-40

Inne pliki z tego folderu:

(Ebook - Pdf) Informix 4Gl And Informix Sql.pdf (195 KB)
(eBook) - Borland Delphi SQL User Guide.PDF (605 KB)
(Ebook) - Delphi - Delphi Learning Pack (8600 Pages).pdf (55943 KB)
(Ebook) Borland - Delphi For Dotnet.pdf (2193 KB)
(eBook) Delphi - Indy In Depth.pdf (777 KB)

Delphi Magazine-Database Design Primer.pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: