Hi, I'd like to crawl a
webpage A (then B, then C, ...) for links but to modify those links. One of the modifications is to insert a
custom_string into the links. This
custom_string isn't constant, it's different for each
webpage A,B,C. The
webpage doesn't care what comes after the question mark in the URL, so I can make the URL look like
original_url?custom_string if that helps. Besides, the
webpage contains
Custom_string, i.e. a version of
custom_string where the first letter is capitalized.
- To crawl the webpage for links, I probably have to use DEEPDECRYPT. As far as I understand, DEEPDECRYPT can pass on information to the crawled links only from the contents of the webpage, but not from its address/URL. Is that correct?
If that is correct, then the solution seems to be this:
- Use DEEPDECRYPT to match the links as well as Custom_string (capital letter!) as well as all_text_between_them. We have to match all_text_between_them because as far as I understand DEEPDECRYPT can only match contiguous strings. Is my understanding correct?
- Modify the links using REWRITE. This includes my planned modifications mentioned above as well as: remove all_text_between_them and replace Custom_string by custom_string (make first letter lowercase). To make the first letter lowercase,
I'll need 26 REWRITE rules: to replace A by a, B by b, and so on. Having 26 REWRITE rules will probably be slow? edit: I'll try to use \l
Do you plan to make the LinkCrawler more powerful? For example DEEPDECRYPT could pass on info from the URL and not only from the page contents. Or REWRITE (or another command) could allow things like adding 32 to the ASCII code of the first letter of
Custom_string, so that it is made lowercase.
Or what about adding a text box to the JDownloader settings where you can code scripts directly (in any imperative language), with variables etc, without having to set up an IDE and boilerplate code to write and compile entire plugins? Thanks!