sneakyimp;11037643 wrote:
Alternatively, I'm thinking I could concoct a Javascript function that uses a regex split along urls and with the resulting array, calculate the length myself. Could use a little help concocting a regex in JS that locates urls that a) start the tweet, b) are in the middle of the tweet and c) end the tweet.
I'm not familiar with Twitter, so I might be missing some complications. But from what I understand…
If it's a personal utility, intended for someone who knows how to type in a fully qualified url… you can make the recognition process that much easier. If it starts with http(s)://… it's a URL. All you have to do is actually always this part (or until you grow tired of it and wish to add more recogniztion power). Moreover, what you are descibing for "images" and "urls" seem to be identical. 22 chars for http, 23 for https.
The easiest is probably
1. On some trigger, such as all keyups, or keyup + timer…
2. explode contents on whitespaces ("SPACE" and LF; tab?) into array
3. for each element, add its computed length:
starts-with "http(s)://" ? Math.max(string-length, 22) : string-length
May not be that efficient but it's a simple starting point. Could probably be improved rather easily by replacing 3 with
3. for each element replace string by object containing
- the replaced string
- start index
- end index
- computed length
Then if you move around inside the input / textarea, you find the element in question and only recalculate its values. And update indices for all following elements - and this can actually be done by adding a new property to each object: "length-modifier". As you traverse the array to check start / end positions to find out where you are, you also keep adding the value of length-modifier and use that to modify the start / end positions.
The only special case I see when using a purely white-spaced split approach is for urls is that do not end with a whitespace. Those that end sentences for http://example.com. But then you only need to inspect last character to find this out.
sneakyimp;11037643 wrote:
I'd certainly appreciate any help in figuring out how to reverse-engineer their site's javascript. It is apparently minimized and I can't seem to figure out how to search all the javascript for references to the id of the html elements that are in play. If anyone has any tips about how to halt JS when loading a page or on a particular event such that I can step through the code, that would be nice.
Actually it doens't seem to be that hard, assuming I'm in the right place. I googled twitter or tweet or some such and found: https://twitter.com/intent/tweet. But nevertheless, the same principles apply.
- Inspect page source
- locate the jQuery that does "stuff" to the input/textarea ($/jQuery('#status').each) and look at some of the surrounding code
- Rclick and inspect element (I am using chrome, but your browser of choice should provide the same features, albeit perhaps differently than described here)
- sources tab - expand the folders to find tfw/intents/tweetbox.js
- semi-read / scan through the code until you find parts that seem to make some kind of sense. The interesting part seem to start around this.$textarea.bind("keyup",
which is later on followed by such function calls as getTextLength, getTweetLength, updateCounter.
- Looking at getTweetLength which seems appropriate, you can also see on which object it is defined: twttr.txt.getTweetlength
- Top right corner, "watch expressions", click the '+' and enter "twttr"
- reload page
- expand twttr, expand txt
- double click getTweetLength to get the function definition and copy paste elsewhere for inspection
- repeat for its function calls.
If you want to inspect the code as it is running, you would need to insert line-breaks to get meaningful break points. Not sure if it is allowed or not. But if it is, copy the appropriate js file, add line breaks, add break-points and go.
sneakyimp;11037643 wrote:
Could use a little help concocting a regex in JS that locates urls that a) start the tweet, b) are in the middle of the tweet and c) end the tweet.
Unless I miss something, the differences are trivial. start-of-string, white-space, end-of-string.
/* reads as: no-capture: start or whitespace, http or https ://, everything up until whitespace or end
*/
/(?:^|\s)http(s)?:\/\/.+(?:$|\s)
Just to be sure, check wether it should be .+ or .+? (after [url]https://)[/url]
I don't remember if js regexp is greedy or not. Greedy would make it match everything until the end, non-greedy until the first occurance of white-space or end-of-string. It should be non-greedy.
Also note that a url preceeding punctuation most likely has to be treated differently. Look at an url ending a sentence, such as http://example.com. Using my regexp, the . would be part of the url, but the full stop doens't belong to the url. Thus, check for trailing punctuation and remove those from url.