JDownloader Community - Appwork GmbH
 

Notices

Reply
 
Thread Tools Display Modes
  #1  
Old 17.07.2021, 02:43
BJN01 BJN01 is offline
JD Adviser
 
Join Date: Jan 2020
Posts: 113
Default request for new command/function for Eventscript

Hello , with scripts you do a lot of things and there are many commands and infinite combinations, but the main commands to "mess" on the html code of a page (= search for the correct links of the desired files) at the moment are:

Code:
var myString = getPage(myString/*URL*/);/*Loads a website (Method: GET) and returns the source code*/

var myString = postPage(myString/*URL*/, myString/*PostData*/);/*Loads a website (METHOD: POST) and returns the source code*/

openURL(myString/*URL*/);/*Open a website or path in your default browser/file explorer*/
they work perfectly for "simple" pages, but by now most sites have dynamic "things" that modify the page after loading (like modifying the html by inserting links to images from arrays or passing them through phantom domains).

So , i would like to suggest / request a creation a command (for EventScript) that allows you to obtain the html of the page AFTER loading (therefore with all the tags and strings modified by the various dynamic elements).

something like :
var myString = getPageMht(myString/*URL*/);/*Loads a website (Method: GET Mht ) and returns the source code After */

which is equivalent to the "inspect + reload page" command in Chrome.




In Chrome (as in other browsers I believe) there are many apps that save the whole page as a single file <webpageAAA.mht>

for example : "Save As MHT" in chrome web store : h##ps://chrome.google.com/webstore/detail/save-as-mht/hfmodljjaibbdndlikgagimhhodmobkc/related?hl=it

and , on the creator page there are the various files with the codes / strings / commands - json , .js and the like. site h##ps://github.com/vsDizzy/SaveAsMHT


<mime html>/<Mht> because
the MHT files should be " a web page archive format which stands for MIME HTML" and "MHT format files does not save images, it only saves links to the online images".
[[ But the app browser create a biig file with the encoded(/encrypted?) images inside + the whole page html + tag + url and the various elements (including dynamic ones). ]]


maybe going through the Mht (lightened, without images and videos; html text only) could be a solution to get the web page code at complete / finished loading [== at <"Ctrl + Shift + I" and reload page>]



or maybe there is already a simpler way; I still hope that we can at least consider the idea for a realization in the future.
Thanks
Reply With Quote
  #2  
Old 17.07.2021, 04:04
raztoki's Avatar
raztoki raztoki is offline
English Supporter
 
Join Date: Apr 2010
Location: Australia
Posts: 17,611
Default

JD browser by default is very simple, it performs a standard GET/POST/PUT/HEAD/etc requests. It does not have javascript/css or related abilities, so no building/changes can be made without those functions. To perform what you would like you would need a fully functioning browser, maybe along the lines of phantomjs (no longer in development according to wikipedia). Their are other headless browsers out there.
__________________
raztoki @ jDownloader reporter/developer
http://svn.jdownloader.org/users/170

Don't fight the system, use it to your advantage. :]
Reply With Quote
  #3  
Old 17.07.2021, 10:19
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 1,533
Default

@BJN01: Try wget. It is capable of mirroring remote sites locally. Can also be called from eventscripter.
Reply With Quote
  #4  
Old 17.07.2021, 15:59
BJN01 BJN01 is offline
JD Adviser
 
Join Date: Jan 2020
Posts: 113
Default

Quote:
Originally Posted by mgpai View Post
@BJN01: Try wget. It is capable of mirroring remote sites locally. Can also be called from eventscripter.
I tried to figure out what I should do, in the examples they say:

Quote:
-Retrieve only one HTML page, but make sure that all the elements needed for the page to be displayed, such as inline images and external style sheets, are also downloaded. Also make sure the downloaded page references the downloaded links.

wget -p --convert-links h#tp://www.example.com/dir/page.html

The HTML page will be saved to w#w.example.com/dir/page.html, and the images, stylesheets, etc., somewhere under w#w.example.com/, depending on where they were on the remote server.

-The same as the above, but without the w#w.example.com/ directory. In fact, I don’t want to have all those random server directories anyway—just save all those files under a download/ subdirectory of the current directory.

wget -p --convert-links -nH -nd -Pdownload \
h#tp://www.example.com/dir/page.html


-Retrieve the index.html of ‘w#w.lycos.com’#, showing the original server headers:
wget -S h#tp://www.lycos.com/

-Save the server headers with the file, perhaps for post-processing.
wget --save-headers ht#p://www.lycos.com/
more index.html

I can try everything, but for starter , afther dl the *.gz / *.gz.sig /*.lz / *.lz.sig file ... haw i use it ?? , where do i put them?

how do i call it in eventscript?
i need an *.au3 ? and what should I write ?

for example :

<wget --save-headers -nd ht#p://www.lycos.com/
more index.html > ??
Reply With Quote
  #5  
Old 17.07.2021, 16:44
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 1,533
Default

Code:
callSync("path/to/wget", "-p", "--convert-links", url);
Reply With Quote
  #6  
Old 18.07.2021, 21:20
BJN01 BJN01 is offline
JD Adviser
 
Join Date: Jan 2020
Posts: 113
Default

i did some tests but i don't get what i hoped for. I have found some scripts in autoit that give perhaps better results than wget but they keep escaping me or pieces I am looking for.

I found some ideas about "things" written with python, and since it should have the ability to create a * .exe of the code maybe I could get something.

if by pure a#s I managed to get a decent .exe callable with ES (I'm not a programmer and I've never studied any language decently) , can I post it in this topic for discusion ?
Reply With Quote
  #7  
Old 19.07.2021, 06:28
mgpai mgpai is offline
Script Master
 
Join Date: Sep 2013
Posts: 1,533
Default

Quote:
Originally Posted by BJN01 View Post
i did some tests but i don't get what i hoped for.
While the format may differ, the mht and wget content should be pretty much the same. Can you provide details/examples of what exactly is missing in wget output?

Quote:
if by pure a#s I managed to get a decent .exe callable with ES , can I post it in this topic for discusion ?
Can help you call such program in eventscripter.
Reply With Quote
  #8  
Old 19.07.2021, 18:57
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,290
Default

@BJN01: I'm sorry but what you want to achieve a *real browser* is required that evaluates css/js.
Wget will only work for static/simple websites where everything is referenced in html but as soon as dynamic/evaluated javascript is involved, it will fail as well.

You will have to find/use a *real browser* or at least *headless browser* that you can control via sort of api.
__________________
JD-Dev & Server-Admin
Reply With Quote
  #9  
Old 20.07.2021, 13:51
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

Quote:
Originally Posted by Jiaz View Post
You will have to find/use a *real browser* or at least *headless browser* that you can control via sort of api.
E.g. via Selenium.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #10  
Old 21.07.2021, 16:22
BJN01 BJN01 is offline
JD Adviser
 
Join Date: Jan 2020
Posts: 113
Default

Quote:
Originally Posted by pspzockerscene View Post
E.g. via Selenium.
yes , is exactly the example-topic that I found on the net ...
for the moment I'm just beyond the <<print (" Hello Word")>> step,
.... I won't abandon the idea but the" development "will be a bit long ...
Reply With Quote
  #11  
Old 21.07.2021, 16:41
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 70,922
Default

I'll mark this as Solved.

We won't be able to teach you how to code but you're free to share possible solutions in this thread.

At this moment we neither have a build-in "browser emulator/remote control" like Selenium nor do we provide official plugins for the purpose of crawling complete websites.

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 09:20.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.