I would like to know if there anyone know of an algorithim short of crc32ing the url for generating shortcodes for urls users submit to my shorlurl/tinyurl script i am creating.
Sincerely,
MiniatureURL
I would like to know if there anyone know of an algorithim short of crc32ing the url for generating shortcodes for urls users submit to my shorlurl/tinyurl script i am creating.
Sincerely,
MiniatureURL
db id field, make it auto increment, does make other urls guessable, but this may not be an issue.
site.com/1 site.com/2 ...
that sounds like a fun programming exercise. i would create a script that takes the user's input web address, generates an incremental hex number (6 digit hex number will give you over 16 million combos), uses a database to map address to hex number, then create a web page that is programmed to redirect. if a user types http://yourshorturl.com/f7395a it will query the DB to find the real web address and redirect the browser to the right page.
the better shortener scripts using url-rewrite with friendly url name
hello,jjozsi.
sandthipe;10904370 wrote:that sounds like a fun programming exercise. i would create a script that takes the user's input web address, generates an incremental hex number (6 digit hex number will give you over 16 million combos), uses a database to map address to hex number, then create a web page that is programmed to redirect. if a user types http://yourshorturl.com/f7395a it will query the DB to find the real web address and redirect the browser to the right page.
Problem is i would have to query the database everytime a url is submitted to see if it's a duplicate. On a large database with over 5m url entries, this will take a long time as well as slow down the server especially when there are alot of visitors on the site.
Sincerely,
Miniature URL
if a link hasn't got at least 5 download activity in 30 days delete it from the DB.
Another way is to make static pages with a simple 10 seconds countdown.
Miniature URL wrote:Problem is i would have to query the database everytime a url is submitted to see if it's a duplicate.
Or you could add a UNIQUE key on the url field and just do an INSERT.
Even 5 million rows shouldn't be too terribly slow depending upon your database server.
EDIT: Moved thread to Database forum; seems like a DB-related solution is most applicable.
How exactly would i go about making a unique key using the url?
$shortcode = crc32($url);
Then i would make the record id the primary key and the shorturl the unique key?
Sincerely,
Miniature URL
You should never set the primary key as unique. Why? becouse the primary key is unique
Is it a good idea to set a url unique? There is a rule in database designing:, if you need to duplicate a row to make a new
version then your database model is poor designed.
You should make a token table, and a url table. If a user insert a link which is in your table select its primary key and insert into the token table, then the original link(ID) will be duplicated and not the whole record(row).
are you sure that crc32 gives you a unique token ? lets crate "friendly url" as token. Use preg_replace to clean the unwanted characters from the user input.
Hello, jjozsi.
Miniature URL wrote:Then i would make the record id the primary key and the shorturl the unique key?
The short URL would be the primary key, because that's the one you'll be checking up on all the time (every time someone uses the shortened URL); you'd only need an index (unique or otherwise) on the real url if you were planning on reusing the same short URL if someone tried to shorten a URL that someone else had already shortened - and would it break anything if you didn't bother doing that?
How you generate the short URL doesn't need to have anything to do with the real URL - it's only a lookup table, after all. Insisting on it would be like insisting on generating your surname based on what your phone number is in order to get it listed in the phone directory.
It's conceivable you could pre-generate a billion or so short URLs in advance (with something like [man]uniqid[/man]), ensure they're unique, and draw from that supply as real URLs are submitted. More likely is that when someone submits a URL you crank out a new short URL, check that it hasn't already been used, and try again if it has. What the real URL may be is irrelevant.
If you really do want something to map full URLs to something shorter, crc32 won't do it. All CRC32 is designed to do is detect if certain (classes of) typos have been committed - there's no guarantee that it would be a reliable way to distinguish between strings that were different to begin with (especially since URLs have a variety of features in common, narrowing the space of possible input strings).
echo rtrim(strtr(base64_encode(sha1($url,true)), '+/', '-_'), '=');
Weedpacket;10904833 wrote:If you really do want something to map full URLs to something shorter, crc32 won't do it. All CRC32 is designed to do is detect if certain (classes of) typos have been committed - there's no guarantee that it would be a reliable way to distinguish between strings that were different to begin with (especially since URLs have a variety of features in common, narrowing the space of possible input strings).
echo rtrim(strtr(base64_encode(sha1($url,true)), '+/', '-_'), '=');
Could you please explain what that code does, i'm not fluent in built-in php functions.
Sincerely,
Miniature URL
Like you say: they're all built-in functions.
[man]rtrim[/man]
[man]strtr[/man]
[man]base64_encode[/man]
[man]sha1[/man]
Weedpacket;10905538 wrote:Like you say: they're all built-in functions.
[man]rtrim[/man]
[man]strtr[/man]
[man]base64_encode[/man]
[man]sha1[/man]
Thanks, but that would generate a 40 character short url. Would it be possible to md5 the result to generate a 6 character short code, or should i chunk_split it and use the first 8 characters of the result as the short code?
Sincerely,
Miniature URL
Miniature URL wrote:Thanks, but that would generate a 40 character short url.
I have not tested it, but I do not think so. With the raw output, sha1() produces a 20 byte output. The PHP manual states that "Base64-encoded data takes about 33% more space than the original data", so I would expect the final result to be about 27 characters long (or even less, since trailing '=' characters are removed).
Miniature URL wrote:Would it be possible to md5 the result to generate a 6 character short code, or should i chunk_split it and use the first 8 characters of the result as the short code?
I am not sure how you computed the length as 6, but if you use md5() the final result would be no more than 22 characters, if my calculations are correct. If the rtrim() reduces that to 20, we could expect something like this:
http://example.com/abcdefghijklmnopqrst
Would that be short enough for you?
Really, though, I am wondering if uniqueness of long URL is necessary. If there are five million URLs, does it really matter if a couple of thousand of them are duplicates and hence can be reached by two short URLs?
Weedpacket;10905577 wrote:Really, though, I am wondering if uniqueness of long URL is necessary. If there are five million URLs, does it really matter if a couple of thousand of them are duplicates and hence can be reached by two short URLs?
I managed to get it to shorten to a 6 character code using substr, but noticed a small very minor problem with your code:
echo rtrim(strtr(base64_encode(sha1($url,true)), '+/', '-_'), '=');
Example 1:
url: http://miniatureurl.110mb.com
shortcode: n-NAtKExample 2:
url: http://www.facebook.com
shortcode: _NqoK2
Notice the - and _ in the shortcodes above? Anyways here is the code i am using to generate the shortcode:
public function generate_shortcode()
{
// generate the shortcode based on the url entered
$this->shortcode = rtrim(strtr(base64_encode(sha1($this->url,true)), '+/', '-_'), '=');
// shorten the shortcode to the desired length
$this->shortcode = substr($this->shortcode, 0, 6);
}
Any ideas?
Sincerely,
Miniature URL
script: http://miniatureurl.110mb.com
That's a problem? I used them as substitutes for + and / because those would be a problem.
Weedpacket;10906055 wrote:That's a problem? I used them as substitutes for + and / because those would be a problem.
Actually no, - and _ are SEO friendly and / and + aren't, thanks. It took me a couple of days to realize this.
Sincerely,
Miniature URL
If your question has been sufficiently answered, please mark this thread as resolved using the thread tools.