#1
|
|||
|
|||
reddit.com again (problem with user profile crawler)
Hi,
The reddit plugin works pretty good. However, some times JD does not find all resources in a directory. E.g. **External links are only visible to Support Staff****External links are only visible to Support Staff** will only result in 6 images downloaded, but inspecting the user page manually reveals many more. I have no idea why this happens sometimes but maybe you could investigate... The crawler settings in the plugin are both set to "-1" (unlimited) Thanks for a great tool! Last edited by cstern; 23.01.2023 at 10:57. |
#2
|
||||
|
||||
Hi,
the problem here is that that profile contains nearly only adult content but our crawler does not have any login features yet. You can easily verify that by opening that profile in a private browser window where you're not logged in into any reddit account. I'll look into it...
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#3
|
||||
|
||||
So at this moment we're using the following request to crawl profiles:
Code:
reddit.com/user/<redditUsername>/.json?limit=100&after=<afterValue> Also as far as I can see I'm not able to find more items than the ones on page 1 even via browser when I'm logged in. The arasing questions are:
Maybe some reddit specialists are around here and can answer (some of) those questions.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#4
|
|||
|
|||
I think ripme2 can crawl through everything (w/o password) - but it doesn't capture videos as does JD.
**External links are only visible to Support Staff****External links are only visible to Support Staff** (btw ripme2 also rips twitter) Also: here is an adult contents reddit profile that JD rips nicely (maybe not all but a lot: **External links are only visible to Support Staff****External links are only visible to Support Staff** Last edited by cstern; 23.01.2023 at 19:07. |
#5
|
|||
|
|||
here is another that gives more than 500 (adult) files in the download queue:
**External links are only visible to Support Staff****External links are only visible to Support Staff** |
#6
|
||||
|
||||
Quote:
Also the "next page logic" is quite the same: Code:
github.com/ripmeapp2/ripme/blob/c9c46d6ae3295cd4ce86ff982f4f891e86e12a54/src/main/java/com/rarchives/ripme/ripper/rippers/RedditRipper.java#L130 It might write more items to your harddrive as it seems to be capable of also saving comments as text-files but unsure about the rest. JDownloader too. Quote:
Definitely reddit doesn't seem to hide anything here because of adult content. Where do I see the total post count of a user on the reddit website?
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#7
|
||||
|
||||
Please collect some links to posts of the profile you linked in your first post with media (image/video) which:
- JD does not find and/or: - The other application does find
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#8
|
|||
|
|||
This is from the log window of ripme when using the first link (48 files):
**External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** When copying these links to paste them here, JD captured them from the clipboard but without the reddit file names. I cannot get the individual URLs of the 6 files detected by JD. |
#9
|
|||
|
|||
Quote:
yes but in ripme you don't need all kinds of tricks to access profiles where you need to be logged in. Maybe the reason is the same (cookies or some other way of getting logged in) as above Have no idea. (maybe someone else knows). I think rimpe actually crawls line by line, like when you scroll down a page in reddit. But again I am not a programmer so I don't know where to look. |
#10
|
|||
|
|||
Oh, here's a weird thing. The six images JD downloads are NOT in the scan from ripme (i.e. JD finds some images ripme doesn't)...
(try to download first by the single reddit link, then by the files from the ripme log window (the links I provided). You will see the files are different... Odd. |
#11
|
|||||||
|
|||||||
By the way our forum allows you to edit existing posts.
I endorse doing so instead of sending multiple new posts in a row in a short time... Quote:
A lot of those links are offline/invalid e.g.: preview.redd.it/48xmqgeo36h81.jpg?width=3024&format=pjpg&auto=webp&v=enabled&d6c5955c Quote:
Which links do you mean exactly? Quote:
Explanation what an API is: en.wikipedia.org/wiki/API Quote:
Quote:
Yes you do need to add a twitter account (instructions) to JD to be able to crawl adult content but that's all - no idea which other "all kinds of tricks" you're talking about. Adding an account to JD takes less than 10 seconds... Quote:
Quote:
I'm still waiting for this: --> I'm talking about links in the following form and not direct-links: reddit.com/user/username/comments/commentID/slug/
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 25.01.2023 at 17:31. Reason: Fixed some typos |
#12
|
||||
|
||||
I got it fixed!
There was a stupid mistake in our crawler which made it skip all items of the current internal page if a single item contained a gallery. This affected all reddit items with pagination which also includes the subreddit crawler. After fixing it, I got 53 results of your initial link (52 online, 1 offline). Bitte auf das nächste CORE-Update warten! Please wait for the next CORE-Update! Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 26.01.2023 at 15:16. |
#13
|
|||
|
|||
Oh, great! Thanks. I was just collecting the requested links - but now I see that you fixed already.
I will check when it is released (I am confident that you have done it, though). I am always amazed how clever you are to fix all the tricks the hosters do :-) Thanks again for the best crawler/ripper/downloader out there! PS: When I post something that seems stupid to you, I don't mean to insult you. I am simply not always clever enough to answer what you ask or understand what you want me to answer. Please do not doubt that I have the deepest respect for you! PPS: regarding twitter: This was what I referred to: https://support.jdownloader.org/Know...n-instructions Last edited by cstern; 29.01.2023 at 16:09. |
#14
|
||||
|
||||
Quote:
Quote:
For most of all websites, all you need to do to add an account is to put in your username and password in JDownloader. Also, while those instructions contain a lot of text and might look complicated at the first glance, that text is supposed to be dummy proof. The sentence "Install that addon to import your cookies into JDownloader" would include the same information but no one would understand it Also most heavy JDownloader users will have that addon installed anyways so it won't be complicated for them. Still I get your point!
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#15
|
|||
|
|||
Now I got that to work as well! Thanks!!!
|
#16
|
||||
|
||||
No problem.
Any hints on how we can design our UI/errormessages in a better way are welcome.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#17
|
||||
|
||||
CORE-Updates have been released!
All announced bugfixes and features are live! Please update your JDownloader and report any issues you find asap. If this thread gets marked as "[Solved]" by our forum staff you can still post in it and we will read- and reply to it! CORE-Updates wurden released! Alle angekündigten Bugfixes/Features sind nun verfügbar! Bitte JDownloader updaten und eventuelle Bugs schnellstmöglich an uns melden. Falls dieser Thread vom Team als "[Erledigt]" markiert wird, kannst du weiterhin darin antworten und wir lesen/beantworten auch solche Threads! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
Thread Tools | |
Display Modes | |
|
|