generic design question

saloon12yrd

Hi folx,

I've got a very generic design question. The situation: I need to store a large amount (~1000 per day) of combined texts in a database.

Combined text means every logical text"unit" consists of several independent parts, say "title, abstract, body". Another example would be "author, abstract, chapter 1, chapter 2, ..., chapter 10".
These textparts need to be stored in seperate BLOBs to be editable separately. It can savely be assumed that no logical text will consist of more than say 20 textparts, though the actual number of textparts will vary from text to text.

I have two possible database scenarios in mind:
1)
Store every textpart as unique dataset with a combined index on (logical textid, textpart number).
This would result in a narrow table with a lot of entries, resulting in many SELECTs to fetch a complete logical text.

2) Use one logical textid and 20 BLOBs to store every textpart of a logical textid in one dataset. Obviously this would result in a broad table with fewer datasets, leading to a fewer SELECTs to fetch a logical text but possibly a lot of overhead.

It can savely be assumed that SELECTs will make up almost all accesses to this table. Changing data will occur only rarely although about 1000 new logical texts need to be added every day (distributed, not all 1000 at a single time).

Now, what are the advantages/disadvantages of both scenarios? Maybe there is even another different and better solution? What would you suggest?

As always thanks in advance,

Dominique

bastien

easy on the BLOBs, dude...

Authors
AuthorID==> PK, autonumber
authorName==>Varchar
AuthorAbstract==>Varchar

BookDescription
BookID==> PK, autonumber
Booktitle==>Varchar
authorKey==>number (FK)
bookid==>number
bookabstract==>varchar

ChapterTable
ChapterID==>number PK, autonumber
BookID==>number
chapterNumber==>number
ChapterBody==>blob

What DB are you using? Have a look at the field capabilities for the DB, you may not need to use blobs as often as you need...

Break it into three table. That way you avoid data duplication of authors, people can then search by book /article abstracts and you can then build from there by using the chapterNumbers as links to get the various details.

hth
Bastien

Bastien

saloon12yrd

Hi Bastien,

the problem is twofold:
1) I'm to develop this application database independent. Right now I've got clients for MySQL, PostGre and Oracle 8i. Taking care of database specific types would be ...hard.
2) The application doesn't know about the size of every subtext.

For a while I had a scenario in mind that was quite similar to the one you're suggesting: Building a highly customized table (using short and long columns on-demand) for every "type" of subtext). But this simply proved impossible.
Reason one: I don't know what tabletype to use until the data poures in. Difficult to explain (I know you're thinking "on-the-fly" now so trust me).
Reason two: Column length variance. Many real life situations will lead to situations there BLOBs will be filled with 50 bytes while say MySQL VARCHARs would need to be filled with 20kB of text. It's a little bit similar to reason one: Just because the first 50 texts of a specific logical context matched criteria xyz doesn't mean the 51th text will. Horrible, I know.
Reason three: Too many tables. Pretty soon the application would have to deal with several hundreds of tables. The overhead of opening every table / keeping it open would soon be larger than the overhead to search a very large table for 20 single datasets.

Nonetheless, thanks for your idea 🙂

Dominique