Hello,

Any idea what would be the best method of parsing this chess PGN file ? I would like to have headers and moves seperatly. So I can manipulate each game later on. Thanks

[Event "ICC tourney 7 (5 5)"]
[Site "Internet Chess Club"]
[Date "2000.03.05"]
[Round "-"]
[White "NDShort"]
[Black "happensreels"]
[Result "1-0"]
[ICCResult "Black resigns"]
[WhiteElo "3007"]
[BlackElo "2714"]
[Opening "Sicilian: closed, Korchnoi variation"]
[ECO "B23"]
[NIC "SI.44"]
[Time "16:50:42"]
[TimeControl "300+5"]

  1. e4 c5 2. Nc3 e6 3. g3 d5 4. Bg2 dxe4 5. Nxe4 Nc6 6. d3 Be7 7. f4 Nf6 8.
    Nf3 Qc7 9. O-O O-O 10. Kh1 b6 11. Nfg5 Bb7 12. Nxf6+ Bxf6 13. Ne4 Be7 14.
    Bd2 Rad8 15. Bc3 Nd4 16. Rf2 Rfe8 17. Qh5 f6 18. Re1 Bf8 19. Bxd4 cxd4 20.
    f5 Kh8 21. g4 exf5 22. gxf5 Bb4 23. Rg1 Bd5 24. a3 Bf8 25. Nxf6 gxf6 26.
    Bxd5 Bg7 27. Be4 Re7 28. Rg4 Qd7 29. Rfg2 Qe8 30. Qh3 Rc7 31. Rh4 Bf8 32.
    Rhg4 Qf7 33. Qg3 Bg7 34. Qf4 Rg8 35. Qd6 Rd7 36. Qe6 Qe7 37. Bd5 Rgd8 38.
    Qxe7 Rxe7 39. Be6 Rc7 40. h4 h6 41. Kh2 h5 42. Re4 Bh6 43. Kh3 Kh7 44. Bb3
    Bc1 45. Re6 Rf8 46. Rg6 Rg7 47. Rxg7+ Kxg7 48. Re7+ Kh6 49. Rxa7 Bxb2 50. a4
    Ba3 51. Rb7 Bc5 52. Kg3 Re8 53. Be6 Re7 54. Rxe7 Bxe7 55. Kf3 Bb4 56. Ke4
    Bc3 57. Bf7 Bb2 58. Be8 Bc3 59. Kd5 Be1 60. Kc6 Ba5 61. Kb5 Kg7 62. Bxh5 Kh6
  2. Bg6 Kg7 64. Kc4 Bc3 65. h5 Kh6 66. Kd5 Kg7 67. Ke6 Bd2 68. Be8 Bb4 69.
    h6+ Kxh6 70. Kxf6 Ba3 71. Bg6 Bb4 72. Kf7 Kg5 73. Bh7 Ba3 74. f6 Bb4 75. Kg7
    Kf4 76. Be4 {Black resigns} 1-0

[Event "ICC 3 0"]
[Site "Internet Chess Club"]
[Date "2001.03.30"]
[Round "-"]
[White "Bonin"]
[Black "rfurdzik"]
[Result "1-0"]
[ICCResult "Black resigns"]
[WhiteElo "2514"]
[BlackElo "2431"]
[Opening "QGD: Ragozin variation"]
[ECO "D38"]
[NIC "NI.27"]
[Time "16:55:21"]
[TimeControl "180+0"]

  1. d4 Nf6 2. c4 e6 3. Nf3 d5 4. Nc3 Bb4 5. Bg5 h6 6. Bxf6 Qxf6 7. Qa4+ Nc6
  2. e3 O-O 9. Rc1 Qg6 10. Qc2 Bxc3+ 11. Qxc3 Bd7 12. Bd3 Qxg2 13. Ke2 Qg4 14.
    Rcg1 Qh5 15. Rg3 f5 16. Rhg1 Rf7 17. cxd5 exd5 18. Kd2 Re8 19. Bb5 f4 20.
    exf4 Re4 21. Bxc6 Bxc6 22. Ne5 Re7 23. Qc5 Qe8 24. f5 Rf4 25. f3 Rxf5 26.
    Qc2 Bd7 27. Qxc7 Be6 28. Qd6 Rf4 29. Ng6 Rxd4+ 30. Kc1 Qc8+ 31. Kb1 Bf5+ 32.
    Ka1 Rc7 33. Ne7+ Rxe7 34. Qxe7 g5 35. Rxg5+ hxg5 36. Rxg5+ Bg6 37. Rxg6+
    {Black resigns}
    1-0

[Event "ICC 3 0"]
[Site "Internet Chess Club"]
[Date "2001.03.30"]
[Round "-"]
[White "rfurdzik"]
[Black "Bonin"]
[Result "0-1"]
[ICCResult "White forfeits on time"]
[WhiteElo "2419"]
[BlackElo "2526"]
[Opening "Sicilian defense"]
[ECO "B20"]
[NIC "SI.48"]
[Time "17:01:35"]
[TimeControl "180+0"]

  1. e4 c5 2. d3 Nc6 3. g3 g6 4. Bg2 Bg7 5. f4 d6 6. Nh3 f5 7. O-O e6 8. c3
    Nge7 9. Be3 O-O 10. Nd2 b6 11. Nf3 h6 12. Nf2 Bb7 13. Qd2 Qc7 14. Rae1 Rae8
  2. h3 Kh7 16. Kh2 Na5 17. g4 fxe4 18. Nxe4 Bxe4 19. dxe4 Nc4 20. Qf2 Nxe3
  3. Qxe3 Rf7 22. Kh1 Ref8 23. Nd2 d5 24. e5 g5 25. fxg5 Rxf1+ 26. Rxf1 Rxf1+
  4. Bxf1 Qxe5 28. Bd3+ Kg8 29. Qxe5 Bxe5 30. gxh6 Bf4 31. h7+ Kg7 32. Nf1 e5
  5. Kg2 e4 34. Be2 Kxh7 35. Bd1 Ng6 36. Bc2 Ne5 37. c4 Nxc4 38. b3 Ne3+ 39.
    Nxe3 Bxe3 40. Kf1 Kg6 41. Ke2 Bf4 42. h4 Bg3 43. Ke3 Bxh4 44. Kf4 Kf6
    {White forfeits on time}
    0-1

[Event "ICC 3 0"]
[Site "Internet Chess Club"]
[Date "2001.03.30"]
[Round "-"]
[White "Bonin"]
[Black "rfurdzik"]
[Result "1-0"]
[ICCResult "Black resigns"]
[WhiteElo "2537"]
[BlackElo "2408"]
[Opening "QGD: Ragozin variation"]
[ECO "D38"]
[NIC "NI.27"]
[Time "17:09:08"]
[TimeControl "180+0"]

  1. d4 Nf6 2. c4 e6 3. Nf3 d5 4. Nc3 Bb4 5. Bg5 h6 6. Bxf6 Qxf6 7. e3 O-O 8.
    Rc1 c5 9. cxd5 exd5 10. dxc5 Nc6 11. a3 Bxc3+ 12. Rxc3 Rd8 13. Nd4 Re8 14.
    Bb5 Bd7 15. Bxc6 bxc6 16. O-O Rab8 17. b4 Re4 18. Rb3 a6 19. Qd3 Bc8 20. a4
    Qe5 21. Nxc6 Qc7 22. Nxb8 {Black resigns} 1-0

[Event "ICC 3 0"]
[Site "Internet Chess Club"]
[Date "2001.03.30"]
[Round "-"]
[White "rfurdzik"]
[Black "Bonin"]
[Result "1-0"]
[ICCResult "Black resigns"]
[WhiteElo "2430"]
[BlackElo "2515"]
[Opening "Sicilian defense"]
[ECO "B20"]
[NIC "SI.48"]
[Time "17:13:24"]
[TimeControl "180+0"]

  1. e4 c5 2. d3 Nc6 3. g3 d6 4. Bg2 g6 5. f4 f5 6. Nf3 e6 7. O-O Bg7 8. c3
    Nge7 9. a4 O-O 10. Re1 Rb8 11. Nbd2 b6 12. exf5 exf5 13. Ng5 d5 14. Ne6 Bxe6
  2. Rxe6 Qd7 16. Re1 Rbe8 17. Nf3 d4 18. c4 h6 19. Bd2 a5 20. Qb3 Nb4 21.
    Bxb4 axb4 22. Qc2 Nc6 23. Nd2 g5 24. Bd5+ Kh7 25. Nf1 gxf4 26. gxf4 Bf6 27.
    Kh1 Ne7 28. Bf3 Qd6 29. Qd2 Ng6 30. Rxe8 Rxe8 31. Bh5 Qc6+ 32. Kg1 Rg8 33.
    Ng3 Bh4 34. Re1 Bxg3 35. hxg3 Nxf4 36. Qxf4 {Black resigns} 1-0

[Event "ICC tourney 25 (8 2)"]
[Site "Internet Chess Club"]
[Date "2001.03.31"]
[Round "?"]
[White "rfurdzik"]
[Black "Sergei Shipov"]
[Result "0-1"]
[ECO "B20"]
[WhiteElo "2407"]
[BlackElo "3135"]
[PlyCount "84"]
[EventDate "2001.??.??"]

  1. e4 c5 2. d3 Nc6 3. g3 g6 4. Bg2 Bg7 5. f4 d6 6. Nh3 e6 7. O-O Nge7 8. c3 O-O
  2. Nd2 b6 10. Nf3 Ba6 11. Be3 Qd7 12. Qd2 Rad8 13. Bf2 Rfe8 14. Kh1 f5 15. Nh4
    $2 (15. Rae1 $11) 15... Rf8 $2 (15... fxe4 16. Bxe4 e5 17. Bg2 e4 $17) 16. Rae1
    Rde8 17. Rg1 Bb7 18. Bf3 e5 19. exf5 Nxf5 20. Nxf5 Rxf5 21. g4 Nd4 22. Be4 Nf3
  3. Qe2 $2 (23. Qe3 Nxg1 24. gxf5 Nxh3 25. Qxh3 d5 26. Bg2 Qxf5 27. Qxf5 gxf5
    $17) 23... Nxg1 24. Rxg1 exf4 25. gxf5 gxf5 26. Nxf4 fxe4 27. Ng2 exd3 28. Qxd3
    Kh8 29. Bg3 d5 30. Re1 Rxe1+ 31. Bxe1 d4 32. cxd4 Qxd4 33. Qg3 Qe5 34. Qh4 Qf6
  4. Qxf6 Bxf6 36. b3 Be4 37. h4 Bb1 38. a3 Bb2 39. Bg3 Bxa3 40. Bb8 a6 41. Ne3
    b5 42. Nd5 Be4+ {White resigns} 0-1

[Event "ICC tourney 29 (8 2)"]
[Site "Internet Chess Club"]
[Date "2001.04.02"]
[Round "-"]
[White "lou-garou"]
[Black "rfurdzik"]
[Result "1-0"]
[ICCResult "Black resigns"]
[WhiteElo "2739"]
[BlackElo "2375"]
[Opening "QGD: Ragozin variation"]
[ECO "D38"]
[NIC "NI.27"]
[Time "14:34:28"]
[TimeControl "480+2"]

  1. Nf3 Nf6 2. c4 e6 3. Nc3 d5 4. d4 Bb4 5. Qa4+ Nc6 6. e3 O-O 7. Bd2 a6 8.
    Qc2 Re8 9. a3 Bf8 10. h3 dxc4 11. Bxc4 e5 12. Ng5 Be6 13. d5 h6 14. dxe6
    fxe6 {Black resigns} 1-0

    I can see a lot of regular expressions and multidimensional arrays in your future ... 🙂

    At a first glimpse, I might read the entire file into a string

    $pgnstring=fread($pgnfile,filesize($pgnfile));

    ...though more sophisticated methods are probably in order given the size these things can get.

    Explode on newlines

    $pgnarray=explode("\n",$pgnstring);

    Except this assumes that header fields never run across linebreaks.

    Set $gamenumber=0;

    Start in "read header" mode:
    For each $line in $pgnarray
    if the $line is blank, skip it and go on to the next

    If in "read header" mode
    if the line is a header line (ereg("[.*]$",$line))
    Set $games[$gamenumber]['line'][]=$line;
    and remain in read header mode
    else
    switch to "read game" mode
    Set $games[$gamenumber]['line'][]=$line

    If in "read game" mode:
    If the line is a header line
    increment $gamenumber
    Set $games[$gamenumber]['line'][]=$line
    switch to "read header" mode
    else
    set $games[$gamenumber]['line'][]=$line
    and remain in "read game" mode.

    Once you've got each game in its own array of lines, you can do more interesting stuff with it, eg.

    ereg("[([ ]+) \"(.+)\"]$",$header,$matches);
    $games[$gamenumber][$matches[1]]=$matches[2];

    Will take one of the header lines and put it in a suitably-named array variable, eg. given the example file

    [Event "ICC tourney 7 (5 5)"]

    would result in
    $games[0]['Event']="ICC tourney 7 (5 5)";

    Full details are probably a bit to extensive to go into here. For example, I never did say what to do if things like header lines or individual moves straddled newlines (as I'm sure is possible in PGN files). You'll probably want to do the multidimensional array thing anyway, to allow sufficiently flexible access, but there might be a bit more work involved to get it there than just the skeleton I sketched out above.

      Hello,

      What do you think about this ? It seems to be working. Just reading headers. Is the data structure OK ?

      <?
      //Settings
      define("MAX_LINE_SIZE", 500);

      $fd = fopen ("b52.pgn", "r");
      //read one line, max line size=500
      $line = fgets($fd, MAX_LINE_SIZE);
      $gamenum=0;
      while (!feof ($fd)) {

        //*****************************************************************
        //Read Header
        //*****************************************************************
      
        //check if it is a tag line, if yes that means header starts here
        if (ereg("\[(.+) \"(.+)\"]", $line, $matches)) {
           $game++;
           $tagnum=0;
           echo "Game # $game <br><br>";
           while (ereg("\[(.+) \"(.+)\"]", $line, $matches)) {
                 //read tags
                 $tagnum++;
                 $games[$gamenum][$matches[1]]=$matches[2];
                 $games[$gamenum]["tagnames"][$tagnum]=$matches[1];
      
                 //Display tag name
                 echo $games[$gamenum]["tagnames"][$tagnum];
                 echo " - ";
                 //Display Tag Value
                 echo $games[$gamenum][$matches[1]];
                 echo "<br>";
                 //read new line
                 $line = fgets($fd, 500);
           }
           $games[$gamenum]["tagnum"]=$tagnum;
           echo "<br>";
           echo "Total number of tags = $tagnum <br><br>";
        }
        //*****************************************************************
      
        //next line
        $line = fgets($fd, 500);

      }
      fclose ($fd);

      ?>

        I do not know why this does not work:

        (ereg("[([:alnum]+) \"([:alnum]+)\"]", $line, $matches))

        This works fine:

        (ereg("[(.+) \"(.+)\"]", $line, $matches))

          I think your character classes might be a bit off - shouldn't they read "[:alnum:]"?

          But if your alternative works (and I see no reason it won't), then I see no reason to fix it :-)

            Thanks a lot for your help, such a blunder 🙂

            I menaged to read the whole game. The problem is, it is very slow. It stops on game #11, move #9 and resumes after a while. Then I got a message:
            "Fatal error: Maximum execution time of 30 seconds exceeded..."

            Should I use preg_match() instead of ereg() ? Or is it somethink else ? Please help.

            <?
            //**************************************************************
            // Check for empty lines
            //
            **************************************************************
            Function IsEmpty($strline) {

                 //$newline=str_replace("\n", "", $strline);
                 //$newline=str_replace("\r", "", $newline);
                 //$newline=str_replace(" ", "", $newline);
                 $newline=ereg_replace("[\n\r\t]+","", $strline);
                 return ($newline=="");

            }
            //*****************************************************************

            //Settings
            define("MAX_LINE_SIZE", 500);

            $fd = fopen ("b52.pgn", "r");
            //read one line, max line size=500
            $line = fgets($fd, MAX_LINE_SIZE);
            $gamenum=0;
            while (!feof ($fd)) {

              //removes any junk (empty lines, spaces) between header and game
              // (there should be only one line, but just in case)
              while (IsEmpty($line)) {
                    //read next line
                    $line = fgets($fd, MAX_LINE_SIZE);
              }
              //*****************************************************************
              //Read Header
              //*****************************************************************
            
              //check if it is a tag line, if yes that means header starts here
              if (ereg("\[(.+) \"(.+)\"]", $line)) {
                 $game++;
                 $tagnum=0;
                 echo "<br> Game # $game <br><br>";
                 while (ereg("\[(.+) \"(.+)\"]", $line, $matches)) {
                       //read tags
                       $tagnum++;
                       $games[$gamenum][$matches[1]]=$matches[2];
                       $games[$gamenum]["tagnames"][$tagnum]=$matches[1];
            
                       //Display tag name
                       echo $games[$gamenum]["tagnames"][$tagnum];
                       echo " - ";
                       //Display Tag Value
                       echo $games[$gamenum][$matches[1]];
                       echo "<br>";
                       //read new line
                       $line = fgets($fd, 500);
                 }
                 $games[$gamenum]["tagnum"]=$tagnum;
                 echo "<br>";
                 echo "Total number of tags = $tagnum <br><br>";
              }
              //*****************************************************************
            
            
              //removes any junk (empty lines, spaces) between header and game
              // (there should be only one line, but just in case)
              while (IsEmpty($line)) {
                    //read next line
                    $line = fgets($fd, MAX_LINE_SIZE);
              }
            
              // Read game text
              $gametext=$line;
              while (! ereg(" (1-0)|(0-1)|(1/2-1/2)|(\*)\r?\n", $line)) {
                     //break;
                     //next line
                    $line = fgets($fd, MAX_LINE_SIZE);
                    $gametext=$gametext.$line;
              }
              echo "$gametext ";
              if (ereg(" (1-0)|(0-1)|(1/2-1/2)|(\*)\r?\n", $line)) {echo "true";} else {echo "false";}
              echo "<br>";
              //Remove next line characters - span lines
              $gametext=ereg_replace("[\n\r\t]+", "", $gametext);
            
               // $gametext=str_replace("/n", "", $gametext);
              // $gametext=str_replace("/r", "", $gametext);
            
              $line = fgets($fd, MAX_LINE_SIZE);

            }
            fclose ($fd);

            ?>

              Congratulations! Frankly, I'd probably only be about halfway through by now :-)

              Eight suggestions (in no particular order - not even the order I thought of them):

              1) Yes, preg_match would probably be a bit faster than ereg.

              2) You probably want to bump up PHP's timeout (set_time_limit(seconds)) so that it runs longer. I generally set_time_limit() just before something that's liable to take a while, even in a loop; perhaps you could do it just before starting a game (when you switch to header mode) - since it's taking about 3 seconds/game, set_time_limit(5) should cover it in that case.

              3) Do you have something like Zend Optimizer (www.zend.com) installed? If not, you might decide it worthwhile installing.

              4) Depending on your subsequent use, you may want to consider using a database of some sort to store parsed games - so that in future you don't have to reread and parse the PGN file (at least until the file changes).

              5) As an alternative (or even complement) to using databases, try using serialize() on each game to create something that can be dumped to a file (or the database!) and read in again later (without needing all that parsing). As for (4), this preparsed file would need to be rewritten if the PGN changes.

              6) I see you have commented-out str_replace() calls for some of the ereg_replaces. Do these run slower than the ereg_replace?

              7) IsEmpty() might be better written as

              function IsEmpty($strline)
              { return !(ereg('[\r\n\t]',$strline));
              }

              The ^ meaning "anything that's not one of the following characters; if the ereg matches anything in that case, then the line's not empty! Since this function is called for every line in the file, it's one that is a good target for optimisation. In fact, it might be an idea to replace calls to isEmpty() with the !(ereg(...)) entirely!

              8) The line toward the end where you "Remove next line characters - span lines" doesn't really need a + - "\n\n\n\r\t" will still be reduced to a null string (actually, five consecutive null strings, but that is a null string!).

              All in all, a commendable job! You sure this is your first file parsing effort?

                Thanks for your help. I did some parsing stuff before(VB, C++), but this is my first php program. I think idea to move the data to mySQL is great, I was actually thinking about it. I want to create a class PGN with all the necessary methods, but first I need to read how to do this. I need to think about SQL tables structure as well.

                As a next step I need to parse moves to its binary representations. This will be little complex, becouse I want to handle subvariations of the move. I feel I need to use tree structure for it.

                1) Do you know any good article about trees in PHP?

                2) Is this structrure OK in my program ?
                $games[$gamenum][tagname]=tagvalue;

                The reason I'm asking. Each game can have different set of tagnames. There are 7 basic tags for each game required - Seven Tag Roster(STR), however it looks like for additional tags which are not in other games, this space will be wasted. Is it something I should worry about ?

                3) Is it necessary for me to keep:
                $games[$gamenum][tagnames] ?

                Or is it better/posiible to retrieve it dynamiclly from $games[$gamenum] ? How do I list all tags for a game ?

                4) How do I check if tag exist ? Is this syntax OK

                if !(games[$gamenum]['UserTag']=NULL) {
                echo games[$gamenum]['UserTag'];
                }

                4) How do I quickly upload CSV text file to mySQL ? Does it need to be in a certain directory ?

                5) Is preg_match syntax cmpletly different ? Which one do you recommend to use ? I know that ereg syntax is popular to use on LINUX - grep etc.

                6) I'm not clear about using serialize(). Does it mean I can dump all arrays and variables with parsed data and retrieve it later? If I use SQL, why would I need to use serialize() ?

                7)

                If you are interested in PGN please see PGN specs at:
                http://aspn.activestate.com/ASPN/CodeDoc/Games-Chess/Games/Chess/PGN.html

                There was a problem with ereg function, so the program went to infinite loop:
                (!ereg(" (1-0)|(0-1)|(1/2-1/2)|(*)\r?\n", $line)

                This is the correct syntax I think:

                (!ereg(" ((1-0)|(0-1)|(1/2-1/2)|(*))\r?\n", $line)

                The new program:

                <?

                //Settings
                define("MAX_LINE_SIZE", 255);

                $fd = fopen ("b52.pgn", "r");
                //read one line, max line size=500
                $line = fgets($fd, MAX_LINE_SIZE);
                $gamenum=0;

                while (!feof ($fd)) {

                  //removes any junk (empty lines, spaces) between header and game
                  // (there should be only one line, but just in case)
                  while (! ereg('[^\r\n\t] ',$line) && (!feof ($fd))) {
                        //read next line
                        $line = fgets($fd, MAX_LINE_SIZE);
                  }
                
                  //*****************************************************************
                  //Read Header
                  //*****************************************************************
                
                  //echo "<BR> Is it header - $line <BR>";
                  //check if it is a tag line, if yes that means header starts here
                  if (ereg("\[(.+) \"(.+)\"]", $line)) {
                     $game++;
                     $tagnum=0;
                     echo "<br> Game # $game <br><br>";
                     while (ereg("\[(.+) \"(.+)\"]", $line, $matches) && (!feof ($fd))) {
                           //read tags
                           $tagnum++;
                           $games[$gamenum][$matches[1]]=$matches[2];
                           $games[$gamenum]["tagnames"][$tagnum]=$matches[1];
                
                           //Display tag name
                           echo $games[$gamenum]["tagnames"][$tagnum];
                           echo " - ";
                           //Display Tag Value
                           echo $games[$gamenum][$matches[1]];
                           echo "<br>";
                           //read new line
                           $line = fgets($fd, 500);
                     }
                     $games[$gamenum]["tagnum"]=$tagnum;
                     echo "<br>";
                     echo "Total number of tags = $tagnum <br><br>";
                  }
                  //*****************************************************************
                
                
                  //removes any junk (empty lines, spaces) between header and game
                  //(there should be only one line anyway, but just in case)
                  while (! ereg('[^\r\n\t] ',$line) && (!feof ($fd))) {
                        //read next line
                        $line = fgets($fd, MAX_LINE_SIZE);
                  }
                
                  //*****************************************************************
                  // Read game text
                  //*****************************************************************
                  $gametext=$line;
                  while (!ereg(" ((1-0)|(0-1)|(1/2-1/2)|(\*))\r?\n", $line) &&
                        (!feof ($fd))){
                         //break;
                         //next line
                        $line = fgets($fd, MAX_LINE_SIZE);
                        $gametext=$gametext.$line;
                        //echo "<BR> Reading game - $line \n" ;
                  }
                
                  //Remove next line characters - span lines
                  $gametext=ereg_replace("[\n\r\t]+", "", $gametext);
                  // $gametext=str_replace("/n", "", $gametext);
                  // $gametext=str_replace("/r", "", $gametext);
                
                  echo "$gametext ";
                  echo "<br>";
                  //*****************************************************************
                
                
                  if (!feof ($fd)) {
                     $line = fgets($fd, MAX_LINE_SIZE);
                  }

                }
                fclose ($fd);

                ?>

                  Hello again,

                  How would I parse moves from $gametext ?

                  I was thinking about this structure:

                  1) moves[halfmove#]['move']=value, example:
                  moves[1]['move']="Nfxd4" - Night from f takes on d4, sometimes more than one Night can take on d4.

                  2) moves[halfmove#]['comment']=value
                  3) moves[halfmove#]['spec_char']=value
                  example = moves[halfmove#]['spec_char']= "$3" - good move
                  4) moves[halfmove#]['subvariation']=value (value is another moves array)

                  (please remeber move is for white and black, halfmove would be for one side only, if odd (2,4..) that means it is black's move). How would I check for odd in PHP ?:

                  Can I use ereg again ? Maybe something like:
                  ereg("([:num:][:num:]?[:num:]?(.|...) [NBQKR]?([a-f]|[1-8])?x?[a-f][1-9] )"
                  Example:
                  1. Nfxd7 or 1... R6xd8 (there is always space after move#. or move#...)

                  What about this ?
                  1. Nfxd7 {comment} R8xd8 $5 {comment}

                  But than how do I output it do the moves array, the way I want ? How would I gett all the move elements (move, comment, special_char, subvariation)?

                  Or should I read byte by byte from begining of string ? How would I read like this, since there is no pointers in PHP ? Honestly I like ereg solution better. Please help.

                  Sample $gametext:

                  {In the absence of bishops this position considers a theoretical draw. That is
                  why black wishes to exchange them and white avoids it. Azmaiparashvili
                  demonstrates an excellent endgame technique} 27. Be3 Rc4 28. Rd6 Ra4 29. Rd2 h5
                  30. Kf1 g5 31. Rc2 Kg6 32. Bc5 Be5 33. a3 Ra6 {Waste of time. After 33....g4,
                  intending to exchange material further by f7-f5-f4 white's task was more
                  complicated.} 34. Ra2 Ra4 35. Bb4 Kf5 36. Ke2 Ke4 37. f3+ Kd5 38. g4 hxg4 39.
                  hxg4 Ra8 40. Kd3 Rh8 41. Rc2 $4 {Black pawns placed to the squares of the "wrong"
                  colour. White is definitely winning already.} 41... f6 42. Rc5+ Ke6 43. a4 Rd8+
                  44. Ke2 Rd4 45. Rb5 Rc4 46. a5 Rc2+ 47. Kd3 Rf2 48. a6 Rxf3+ 49. Kc4 Rf4+ 50.
                  Kb3 Rf3+ 51. Ka4 Rh3 52. a7 Rh8 53. Ka5 Kd7 54. Ka6 1-0

                    Hello,

                    I feel I need to use preg_match_all. Is it right ? Why there is no ereg_all in PHP?

                      How do I scan string $gametext character by character like in C using pointers ? Can I move back and forth in a string in PHP ?

                      Thanks for your help.

                      Where are you from ? (Australia ?) It looks like you reply when I sleep 🙂

                        You can read PHP strings like arrays of characters - if $string is "Whatever", then $string[2]="t". I've never used this construct, but my guess is that it may be read-only.

                        Re. your prior email, preg_match_all could well be a better choice for grabbing all the moves of a game simultaneously - it returns all matches of its patterns in an array.

                        On the subject of trees, I do have one link regarding the things in MySQL:

                        http://www.zend.com/codex.php?id=554&single=1

                        Testing if a tag exists - isset($array['tagname']) would return if it's set.

                        keys($array) will return a list of all the keys that $array uses; so if you have $array['foo'], $array['bar'], keys($array) is an array('foo','bar').

                        Passing a structure like the $games array to serialize() will return a string that contains all the data contained in $games, which can then be retrieved by using unserialize(). Basically it would flatten out all the parsed structure into a string. I suggested it as an alternative to SQL databases (it could have been saved in a "parsed PGN" text file), but I just noted that there's no earthly reason why you couldn't store it in a SQL database (though why you would I had no idea).

                        Additional tags for some games are no burden on others - PHP's associative arrays are created dynamically, and only those keys that have values set take up memory. There's an article somewhere on www.zend.com that goes into more detail on the workings of PHP's memory management - it's basically a reference-counting model.

                        -

                        Another idea I had that may speed things up a bit and make other bits easier: Read the entire file into one large $rawstring, then (close the file and) use $rawarray=explode("\n",$rawstring) to get an array of lines.

                        Alternatively, you can
                        $rawstring=preg_replace("\n\r\t",'',$rawstring);
                        $rawstring=str_replace('[',"\n[",$rawstring);
                        $rawstring=str_replace(']',"]\n",$rawstring);
                        and then explode().

                        Have
                        function strip_blank_lines($string)
                        { return $string!='';
                        }

                        and then $rawarray=array_filter($rawarray,'strip_blank_lines');

                        There will be some blank lines - between each pair of header lines.

                        What you'll get as a result of this would be an array containing:
                        [header line]
                        [header line]
                        [header line]
                        game line
                        [header line]
                        ...

                        No blank lines (saving that check), each entire header line on a single line of its own, and each entire game on a single line of its own. Parsing can then proceed as before, except through the array instead of the file.

                        Since they're likely to be such big beasties, you'd want to unset() $rawstring and $rawarray the instant you're done with them! Eg., $rawstring can go as soon as $rawarray is created.

                        One last idea - once you've got this running sweet it might make a good subject for a PHPBuilder article!

                        And your last question - my proxy server is in the same country I am :-)

                          Hello,

                          Yes, I did not look at your domain name 🙂 I noticed many people from New Zeland post PHP articles here. I always wanted to visit New Zeland. I heard it is very beautifull.

                          I'm new to PHP. I do not understand why reading the wole file to the array is faster than reading it line by line. I noticed this way of doing in some other examples. Is not it just for simplicity ?

                          We will be doing more operations:

                          1) Read whole file to huge string (this will probably take a lot of memory, since the string may be huge, there is no limitation on pgn file, what if it contains 50,000 games ?)
                          2) explode to array (additional processing + memory)
                          3) address its element (probably not a big deal - ms)
                          4) Parsing the element

                          By reading line by line (the way it is now)

                          1) Read to small string (one line only)
                          2) Parse the string

                          It only goes once thru the PGN game. Reading and parsing at the same time.

                          I hope I made myself clear. Please let me know what you think. I feel this may be interesting for many PHP novices with C background.

                          Acording to PGN format specs:
                          - There is no white spaces (/n/r) between "[" and "]". That means those tags can not split bewteen lines
                          -I think, one tag per line is mandatory too

                          I noticed that your example may not work if in the body of the game there will be "[]" characters. Example:

                          1.e4 e5 2. Nf3 {worse is [2. d4 exd4]} 2... Nc6

                          However, I'm not sure if it is allowed.

                            Hello again,

                            I put everything to the class clsPGNGame. So now the constructor reads the whole file to the array $games. It took me a while to figure out that this syntax: "$this->$games" is incorrect !

                            We have two methods:
                            PrintAllGames
                            PrintGame($gamenumber)

                            Similar two methods (without HTML tags") will be used later to creat a pgn file on fly. As you can noticed I did not return a big string from PrintAllGames and then print it, since I was concerned about performance again.

                            There is a real mystery here (mayby a bug ?) !!! Please check those lines:


                            while ... {
                            ...
                            $gametext=&$this->games[$gamenum]["rawgame"];
                            ...
                            //RF This one works (display data) when
                            //element 12 is reached inside of the loop !!!
                            echo "<br>{$this->games[12]["rawgame"]}<br>";
                            }

                            //RF This one does not work outside of the loop !!! The same syntax !!! No display !!!
                            echo "<br>{$this->games[12]["rawgame"]}<br>";


                            THE WHOLE CODE:

                            <?
                            //Settings
                            define("MAX_LINE_SIZE", 255);

                            //**************************************************************
                            // Class PGN - to handle chess PGN files
                            //
                            **************************************************************
                            class clsPGNGame {

                            var $filename; //name of the PGN file
                            var $games; //Array of all PGN games.

                            //**************************************************************
                            // Class constructor, reads PGN file and sets variables
                            //
                            **************************************************************
                            function clsPGNGame($filename) {

                              $fd = fopen ("$filename", "r");
                              //read one line, max line size=500
                              $line = fgets($fd, MAX_LINE_SIZE);
                              $gamenum=0;
                            
                              while (!feof ($fd)) {
                            
                                 //removes any junk (empty lines, spaces) between header and game
                                 // (there should be only one line, but just in case)
                                 while (! ereg('[^\r\n\t] ',$line) && (!feof ($fd))) {
                                    //read next line
                                    $line = fgets($fd, MAX_LINE_SIZE);
                                 }
                            
                                 //*****************************************************************
                                 //Read Header
                                 //*****************************************************************
                            
                            
                                 //check if it is a tag line, if yes that means header starts here
                                 if (ereg("\\[(.+) \"(.+)\"]", $line)) {
                                    $gamenum++;
                                    $tagnum=0;
                                    //echo "<br> Game # $game <br><br>";
                                    while (ereg("\\[(.+) \"(.+)\"]", $line, $matches) && (!feof ($fd))){
                                       //read tags
                                       $tagnum++;
                                       $this->games[$gamenum]["tags"][$matches[1]]=$matches[2];
                                       //$games[$gamenum]["tagnames"][$tagnum]=$matches[1];
                            
                                       //keys($games($gamenum)) will return a list of all the keys that
                                       //$game uses
                                       //Test if tag exist - isset($array['tagname'])
                            
                                       //read new line
                                       $line = fgets($fd, 500);
                                    }
                            
                                    //Total number of tags for this game:
                                    $this->games[$gamenum]["tagnum"]=$tagnum;
                                    //echo "<br>";
                                    //echo "Total number of tags = $tagnum <br><br>";
                                 }
                                 //*****************************************************************
                            
                                 //Removes any junk (empty lines, spaces) between header and game
                                 //(there should be only one line anyway, but just in case)
                                 while (! ereg('[^\r\n\t] ',$line) && (!feof ($fd))) {
                                    //read next line
                                    $line = fgets($fd, MAX_LINE_SIZE);
                                 }
                            
                                 //*****************************************************************
                                 // Read game text
                                 //*****************************************************************
                            
                                 //To avoid long names we use reference
                                 $gametext=&$this->games[$gamenum]["rawgame"];
                            
                                 $gametext=$line;
                                 while (!ereg(" ((1-0)|(0-1)|(1/2-1/2)|(\\*))\r?\n", $line) &&
                                       (!feof ($fd))){
                                       //next line
                                       $line = fgets($fd, MAX_LINE_SIZE);
                                       $gametext=$gametext.$line;
                                 }
                            
                                 //Remove next line characters - span lines
                                 $gametext=ereg_replace("[\n\r\t]+", "", $gametext);
                            
                                 //RF This one works when element 12 is reached
                                 // inside of the loop !!!
                                 echo "<br>{$this->games[12]["rawgame"]}<br>";
                            
                                 //*****************************************************************
                            
                                 if (!feof ($fd)) {
                                    $line = fgets($fd, MAX_LINE_SIZE);
                                 }
                                 }  //end main while loop
                            
                                 //RF This one does not work outside of the loop !!!
                                 echo "<br>{$this->games[12]["rawgame"]}<br>";
                                 fclose ($fd);

                            } //end PGNGame constructor

                            //***************************************************************************

                            function PrintGame($anum) {

                              //Display tags
                                 foreach($this->games[$anum]["tags"] as $tagname => $tagvalue) {
                                    echo "$tagname - $tagvalue<BR>\n";
                                 }
                              //Display Game Text
                              echo "<br>";
                              echo $this->games[$anum]["rawgame"];
                              //echo "<br>{$this->games[12]["rawgame"]}<br>";
                              echo "<br><br>\n";

                            }

                            //***************************************************************************

                            function PrintAllGames() {

                              foreach($this->games as $gamenum => $game)  {
                                 //echo "Game # $gamenum <br><br>\n";
                            
                                 foreach($game["tags"] as $tagname => $tagvalue) {
                                    echo "[$tagname = $tagvalue]<BR>\n";
                                 }
                                 //Display Game Text
                                 echo "<br>{$this->games[$gamenum]["rawgame"]}<br><br>\n";
                              }
                            }

                            //***************************************************************************

                            }
                            //**************************************************************************
                            //end class
                            //
                            **************************************************************************

                            ?>

                            <?
                            //**************************************************************************
                            //Main Program
                            //
                            **************************************************************************

                            $Game=new clsPGNGame("b52.pgn");

                            //Prints the game #2
                            //$Game->PrintGame(12);

                            //$Game->PrintAllGames();

                            ?>

                              Hello,

                              Did you have a chance to look at the problem with the loop ?

                              How would I convert this code to PHP ?

                              "...

                              You can easily parse a move with regexes.

                              @tokens = split /\s+/, $movetext;

                              foreach my $i (@tokens) {

                              just two examples, the easiest and one of the most complex

                              ones (but certainly not the most complex!)

                              if ($i =~ m/([NBRQK])([a-h][1-8])(!!|!|!?|?!|\?|\?\?)$/) {
                              $piece = $1;
                              $file = $2;
                              $rank = $3;
                              $comment = $4;
                              } elsif ($i =~ m/([a-h])([1-8])/) {
                              $piece = 'pawn';
                              $file = $1;
                              $rank = $2;
                              $comment = '';
                              }
                              }

                              --
                              #!/usr/bin/perl

                              ..."

                                5 days later
                                Write a Reply...