JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 23.01.2023, 08:38
cstern cstern is offline
Mega Loader
 
Join Date: Jul 2012
Location: Denmark
Posts: 61
Default reddit.com again (problem with user profile crawler)

Hi,

The reddit plugin works pretty good. However, some times JD does not find all resources in a directory.

E.g. **External links are only visible to Support Staff****External links are only visible to Support Staff** will only result in 6 images downloaded, but inspecting the user page manually reveals many more.

I have no idea why this happens sometimes but maybe you could investigate...

The crawler settings in the plugin are both set to "-1" (unlimited)

Thanks for a great tool!

Last edited by cstern; 23.01.2023 at 10:57.
Reply With Quote
  #2  
Old 23.01.2023, 16:56
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

Hi,
the problem here is that that profile contains nearly only adult content but our crawler does not have any login features yet.
You can easily verify that by opening that profile in a private browser window where you're not logged in into any reddit account.

I'll look into it...
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #3  
Old 23.01.2023, 17:53
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

So at this moment we're using the following request to crawl profiles:
Code:
reddit.com/user/<redditUsername>/.json?limit=100&after=<afterValue>
This API is undocumented afaik.
Also as far as I can see I'm not able to find more items than the ones on page 1 even via browser when I'm logged in.
The arasing questions are:
  • Does this API just not provide any adult content at all?
  • Is there maybe a simple parameter to include adult content using that same request without login?
  • Is the only possible way to access adult content using the API and/or crawling the website?

Maybe some reddit specialists are around here and can answer (some of) those questions.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #4  
Old 23.01.2023, 19:02
cstern cstern is offline
Mega Loader
 
Join Date: Jul 2012
Location: Denmark
Posts: 61
Default

I think ripme2 can crawl through everything (w/o password) - but it doesn't capture videos as does JD.

**External links are only visible to Support Staff****External links are only visible to Support Staff**

(btw ripme2 also rips twitter)

Also: here is an adult contents reddit profile that JD rips nicely (maybe not all but a lot: **External links are only visible to Support Staff****External links are only visible to Support Staff**

Last edited by cstern; 23.01.2023 at 19:07.
Reply With Quote
  #5  
Old 23.01.2023, 19:21
cstern cstern is offline
Mega Loader
 
Join Date: Jul 2012
Location: Denmark
Posts: 61
Default

here is another that gives more than 500 (adult) files in the download queue:
**External links are only visible to Support Staff****External links are only visible to Support Staff**
Reply With Quote
  #6  
Old 24.01.2023, 16:58
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

Quote:
Originally Posted by cstern View Post
I think ripme2 can crawl through everything (w/o password) - but it doesn't capture videos as does JD.
I've looked into it and "ripme" is using the exact same requests we are.
Also the "next page logic" is quite the same:
Code:
github.com/ripmeapp2/ripme/blob/c9c46d6ae3295cd4ce86ff982f4f891e86e12a54/src/main/java/com/rarchives/ripme/ripper/rippers/RedditRipper.java#L130
Now I haven't tested it but I doubt that ripme will find more than JDownloader.
It might write more items to your harddrive as it seems to be capable of also saving comments as text-files but unsure about the rest.

Quote:
Originally Posted by cstern View Post
btw ripme2 also rips twitter
JDownloader too.

Quote:
Originally Posted by cstern View Post
Also: here is an adult contents reddit profile that JD rips nicely (maybe not all but a lot:
Yeah I see.
Definitely reddit doesn't seem to hide anything here because of adult content.

Where do I see the total post count of a user on the reddit website?
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #7  
Old 24.01.2023, 17:17
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

Please collect some links to posts of the profile you linked in your first post with media (image/video) which:
- JD does not find
and/or:
- The other application does find
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #8  
Old 24.01.2023, 18:31
cstern cstern is offline
Mega Loader
 
Join Date: Jul 2012
Location: Denmark
Posts: 61
Default

This is from the log window of ripme when using the first link (48 files):

**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**
**External links are only visible to Support Staff****External links are only visible to Support Staff**

When copying these links to paste them here, JD captured them from the clipboard but without the reddit file names.

I cannot get the individual URLs of the 6 files detected by JD.
Reply With Quote
  #9  
Old 24.01.2023, 18:40
cstern cstern is offline
Mega Loader
 
Join Date: Jul 2012
Location: Denmark
Posts: 61
Default

Quote:
Originally Posted by pspzockerscene View Post
I've looked into it and "ripme" is using the exact same requests we are.
Also the "next page logic" is quite the same:
Code:
github.com/ripmeapp2/ripme/blob/c9c46d6ae3295cd4ce86ff982f4f891e86e12a54/src/main/java/com/rarchives/ripme/ripper/rippers/RedditRipper.java#L130
Now I haven't tested it but I doubt that ripme will find more than JDownloader.
It might write more items to your harddrive as it seems to be capable of also saving comments as text-files but unsure about the rest.
Ok, but anyway ripme also gets more jpgs - I understand that the code seems the same, but there then may be another difference - maybe ripme reads cookies or something that allows the client program to do more (e.g. by logging in to reddit behind the scenes. I am by no means a programmer so I am just guessing


Quote:
Originally Posted by pspzockerscene View Post
JDownloader too.
yes but in ripme you don't need all kinds of tricks to access profiles where you need to be logged in. Maybe the reason is the same (cookies or some other way of getting logged in) as above


Quote:
Originally Posted by pspzockerscene View Post
Where do I see the total post count of a user on the reddit website?
Have no idea. (maybe someone else knows). I think rimpe actually crawls line by line, like when you scroll down a page in reddit. But again I am not a programmer so I don't know where to look.
Reply With Quote
  #10  
Old 24.01.2023, 18:45
cstern cstern is offline
Mega Loader
 
Join Date: Jul 2012
Location: Denmark
Posts: 61
Default

Oh, here's a weird thing. The six images JD downloads are NOT in the scan from ripme (i.e. JD finds some images ripme doesn't)...
(try to download first by the single reddit link, then by the files from the ripme log window (the links I provided). You will see the files are different... Odd.
Reply With Quote
  #11  
Old 25.01.2023, 16:54
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

By the way our forum allows you to edit existing posts.
I endorse doing so instead of sending multiple new posts in a row in a short time...

Quote:
Originally Posted by cstern View Post
This is from the log window of ripme when using the first link (48 files):
And what is this supposed to tell me?
A lot of those links are offline/invalid e.g.:
preview.redd.it/48xmqgeo36h81.jpg?width=3024&format=pjpg&auto=webp&v=enabled&d6c5955c

Quote:
Originally Posted by cstern View Post
When copying these links to paste them here, JD captured them from the clipboard but without the reddit file names.
Sure without them because the context is missing here so JD will treat them as normal direct-downloadable links.

Quote:
Originally Posted by cstern View Post
I cannot get the individual URLs of the 6 files detected by JD.
Which links do you mean exactly?

Quote:
Originally Posted by cstern View Post
I understand that the code seems the same, but there then may be another difference
I didn't say "same code" I simply said that they are using the same API of reddit which we are using.
Explanation what an API is:
en.wikipedia.org/wiki/API

Quote:
Originally Posted by cstern View Post
maybe ripme reads cookies or something that allows the client program to do more
No.

Quote:
Originally Posted by cstern View Post
yes but in ripme you don't need all kinds of tricks to access profiles where you need to be logged in. Maybe the reason is the same (cookies or some other way of getting logged in) as above
I haven't looked deeper into this and I won't as our twitter plugin is working fine atm.
Yes you do need to add a twitter account (instructions) to JD to be able to crawl adult content but that's all - no idea which other "all kinds of tricks" you're talking about.
Adding an account to JD takes less than 10 seconds...

Quote:
Originally Posted by cstern View Post
I think rimpe actually crawls line by line, like when you scroll down a page in reddit.
No it doesn't.

Quote:
Originally Posted by cstern View Post
Oh, here's a weird thing. The six images JD downloads are NOT in the scan from ripme (i.e. JD finds some images ripme doesn't)...
Interesting.

I'm still waiting for this:
Quote:
Originally Posted by pspzockerscene View Post
Please collect some links to posts of the profile you linked in your first post with media (image/video) which:
- JD does not find
and/or:
- The other application does find
--> I'm talking about links in the following form and not direct-links: reddit.com/user/username/comments/commentID/slug/
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?

Last edited by pspzockerscene; 25.01.2023 at 17:31. Reason: Fixed some typos
Reply With Quote
  #12  
Old 25.01.2023, 17:36
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

I got it fixed!

There was a stupid mistake in our crawler which made it skip all items of the current internal page if a single item contained a gallery. This affected all reddit items with pagination which also includes the subreddit crawler.
After fixing it, I got 53 results of your initial link (52 online, 1 offline).

Bitte auf das nächste CORE-Update warten!

Please wait for the next CORE-Update!

Wartest du auf einen angekündigten Bugfix oder ein neues Feature?
Updates werden nicht immer sofort bereitgestellt!
Bitte lies unser Update FAQ! | Please read our Update FAQ!

---
Are you waiting for recently announced changes to get released?
Updates to not necessarily get released immediately!
Bitte lies unser Update FAQ! | Please read our Update FAQ!


-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?

Last edited by pspzockerscene; 26.01.2023 at 15:16.
Reply With Quote
  #13  
Old 28.01.2023, 19:25
cstern cstern is offline
Mega Loader
 
Join Date: Jul 2012
Location: Denmark
Posts: 61
Default

Oh, great! Thanks. I was just collecting the requested links - but now I see that you fixed already.

I will check when it is released (I am confident that you have done it, though).

I am always amazed how clever you are to fix all the tricks the hosters do :-)

Thanks again for the best crawler/ripper/downloader out there!

PS: When I post something that seems stupid to you, I don't mean to insult you. I am simply not always clever enough to answer what you ask or understand what you want me to answer. Please do not doubt that I have the deepest respect for you!

PPS: regarding twitter: This was what I referred to: https://support.jdownloader.org/Know...n-instructions

Last edited by cstern; 29.01.2023 at 16:09.
Reply With Quote
  #14  
Old 30.01.2023, 13:32
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

Quote:
Originally Posted by cstern View Post
I am always amazed how clever you are to fix all the tricks the hosters do :-)

Thanks again for the best crawler/ripper/downloader out there!
Thanks for your feedback!

Quote:
Originally Posted by cstern View Post
PPS: regarding twitter: This was what I referred to: https://support.jdownloader.org/Know...n-instructions
Well compared to not needing an account at all, that might seem to be complicated but you need to understand that this is a special case.
For most of all websites, all you need to do to add an account is to put in your username and password in JDownloader.
Also, while those instructions contain a lot of text and might look complicated at the first glance, that text is supposed to be dummy proof.
The sentence "Install that addon to import your cookies into JDownloader" would include the same information but no one would understand it

Also most heavy JDownloader users will have that addon installed anyways so it won't be complicated for them.
Still I get your point!
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #15  
Old 31.01.2023, 14:32
cstern cstern is offline
Mega Loader
 
Join Date: Jul 2012
Location: Denmark
Posts: 61
Default

Now I got that to work as well! Thanks!!!
Reply With Quote
  #16  
Old 31.01.2023, 15:31
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

No problem.
Any hints on how we can design our UI/errormessages in a better way are welcome.
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
  #17  
Old 24.03.2023, 14:19
pspzockerscene's Avatar
pspzockerscene pspzockerscene is offline
Community Manager
 
Join Date: Mar 2009
Location: Deutschland
Posts: 72,961
Default

CORE-Updates have been released!
All announced bugfixes and features are live!
Please update your JDownloader and report any issues you find asap.
If this thread gets marked as "[Solved]" by our forum staff you can still post in it and we will read- and reply to it!

CORE-Updates wurden released!
Alle angekündigten Bugfixes/Features sind nun verfügbar!
Bitte JDownloader updaten und eventuelle Bugs schnellstmöglich an uns melden.
Falls dieser Thread vom Team als "[Erledigt]" markiert wird, kannst du weiterhin darin antworten und wir lesen/beantworten auch solche Threads!

-psp-
__________________
JD Supporter, Plugin Dev. & Community Manager

Erste Schritte & Tutorials || JDownloader 2 Setup Download
Spoiler:

A users' JD crashes and the first thing to ask is:
Quote:
Originally Posted by Jiaz View Post
Do you have Nero installed?
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 23:45.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.