help with importing files information into database table

qnc

I would like your help in combining the information from three uploaded files and then using the files information to query the database and then add that information plus the information from the files into different tables in database!!

The case is as follows.

I have two files (the third is discussed at the end)

One contains people and representations of their genetic code at a particular points on their genome. Alleles are the code and the points are called markers. *.ped is the name of the file but it is a simple text file for pedigree. The first part of each line looks like this:

ADRP134 0247 0227 0228 2 2

this is the family ID, the persons ID, the father's ID, the mother ID, whether the person is male or female (1 or 2) and whether they have the disease or not (1 or 2). This is followed on the same line by their genetic code at a particular point which in this case is:

1 2 2 3 1 10 4 5

There are two numbers for each point. So the first point is 1 2 the second 2 3 and so on

The other file (*.dat for data file but again it is a simple text file) contains the name of the points (markers):

The whole file looks similar to this:

5 0 0 5 << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM
0 0.000000 0.000000 0 << MUT LOCUS, MUT RATE, HAPLOTYPE FREQUENCIES (IF 1)
1 2 3 4 5
1 2
0.999990 0.000010 << GENE FREQUENCIES
1 << NO. OF LIABILITY CLASSES
0.000000 0.900000 0.900000
3 2 # D20S906
0.500000 0.500000 << GENE FREQUENCIES
3 7 # D22S280
0.142857 0.142857 0.142857 0.142857 0.142857 0.142857 0.142857 << GENE FREQUENCIES
3 10 # D22S423
0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 << GENE FREQUENCIES
3 8 # D22S274
0.125000 0.125000 0.125000 0.125000 0.125000 0.125000 0.125000 0.125000 << GENE FREQUENCIES
0 0 << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2)
0.166670 0.166670 0.166670 0.166670 0.000000 << RECOMB VALUES
1 0.1 0.45 << REC VARIED, INCREMENT, FINISHING VALUE

However the points' names are only contained in lines beginning with 3 so here it is extracted out:

3 2 # D20S906
0.500000 0.500000 << GENE FREQUENCIES
3 7 # D22S280
0.142857 0.142857 0.142857 0.142857 0.142857 0.142857 0.142857 << GENE FREQUENCIES
3 10 # D22S423
0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 0.100000 << GENE FREQUENCIES
3 8 # D22S274
0.125000 0.125000 0.125000 0.125000 0.125000 0.125000 0.125000 0.125000 << GENE FREQUENCIES

In these lines the names are D20S906, D22S280, D22S423, D22S274 respectively and correspond to 1 2 2 3 1 10 4 5 mentioned above. interestingly the number after 3 represents how many alleles have been found in that family for that marker. the fractions below are the frequency of each allele however we can calculate this more accurately once this system is instituted.

Now to the task I was hoping to import both files

from the *.dat file I want to extract what markers are being used.

into a table called raw allele I want to store the persons unique id which corresponds to an entry in another table called family the example above is entry entry number 36. (The *.ped file carries the fam_id and the person ID which corresponds to a unique entry in the family data table this is where a query has to be done)

therefore in the *.ped has to be compared to the corresponding record in the family table and then the alleles recorded as well

so the table raw allele data table will start to look like this:

Ui, fam_ui, d_no, allele1, allele2, ori files
1 , 36 , D20S906, 1, 2
2 , 36 , D22S280, 2, 3

now ori files is the third file which is an *.SMP (again a text file) file that links the original genetic reader instrument file to the ped file. i.e. family = adrp134 person = 0247 the file containing the original data = qnc200505-a1-adrp134-0247-09.fas (not a text file but its name will suffice for future searches back to the raw data)

All I would like is for the qnc200505-a1-adrp134-0247-09.fas to be added to the appropriate line in the raw allele table file.

so after the three files have been uploaded and manipulated

the final entry would look something like

Ui, fam_ui, d_no, allele1, allele2, ori files
1 , 36 , D20S906, 1, 2, qnc200505-a1-adrp134-0247-09.fas
2 , 36 , D22S280, 2, 3, qnc200505-a1-adrp134-0247-09.fas

To summaries it is uploading three files simultaneously, if possible, using the information in them to preform a basic query and then using some of the information in the files and the info from the query to populate another table.

Can anyone help with part or all of it or point me in the right direction to figure it out for myself?