#1
|
|||
|
|||
twitter crawler by date not working with old date pinned tweet
log : 08.07.23 00.38.25 <--> 08.07.23 01.12.11 jdlog://8036311370661/
When crawling twitter by date with ?max_date=yyyy-mm-dd. It's result in nothing if that user have pinned tweet with the date older than the "?max_date" input even though there are more new tweet recently. ex. I use ?max_date=2023-07-01 and even though there are tweet on date 2023-07-03 but if there is a pinned tweet with the date 2022-12-01 the crawling will stop with no result. If there is no pinned tweet with older date, it's work fine. I guess it's detected the pinned tweet as the latest date and stop as soon as it's start. |
#2
|
||||
|
||||
Hi,
I've updated our twitter crawler to ignore pinned tweets in that "tweet date handling". Wartest du auf einen angekündigten Bugfix oder ein neues Feature? Updates werden nicht immer sofort bereitgestellt! Bitte lies unser Update FAQ! | Please read our Update FAQ! --- Are you waiting for recently announced changes to get released? Updates to not necessarily get released immediately! Bitte lies unser Update FAQ! | Please read our Update FAQ! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#3
|
|||
|
|||
As of the latest update today.
Still not working. Is it because some recent tweet is show as post "12 hours ago" instead of date so the crawling stop? 13.07.23 14.15.13 <--> 13.07.23 14.32.28 jdlog://0346311370661/ Last edited by JTK; 13.07.2023 at 10:45. |
#4
|
||||
|
||||
Works fine here.
Please provide example URLs with your preferred "max_date" value inside them. EDIT The "12 hours ago" presentation of the website is only a representation - internally, all dates are available as full dates so that is not the cause of your problem.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 13.07.2023 at 14:07. Reason: EDIT |
#5
|
|||
|
|||
NSFW
They got some vid on 2023-07-12. **External links are only visible to Support Staff****External links are only visible to Support Staff** **External links are only visible to Support Staff****External links are only visible to Support Staff** After I crawled and downloaded with "maxitems" I deleted the link and file and crawl again with "maxitems" and "max_date" separately. Only the "maxitems" show result the "max_date" one show nothing. Is the "max_date" value check only the tweet on the site or some that already crawled in my download list too? ps. I just got another update but still the same result. Last edited by JTK; 13.07.2023 at 16:43. |
#6
|
||||
|
||||
maxitems will simply only return the first X items so no wonder it's working fine.
I get two results of one tweet with that value. The twitter profile you've linked only contains one item before max_date=2023-07-10. Next item was posted on 2023-07-12 which is higher than your given limit this only one tweet gets crawled. To me it looks like you're either experiencing a strange/new bug or you didn't understand, how "max_date" works. EDIT Please provide a debug-log: Please post your log-ID here If your bugreport is about a specific website which JD supports via plugin, please also provide example URLs which can be used to reproduce the issue you are having. Bitte poste deine Log-ID hier. Falls dein gemeldetes Problem ein Problem mit einer Webseite ist, die per Plugin unterstützt wird, stelle bitte zusätzlich Testlinks zur Verfügung, mit denen sich dein Problem nachstellen lässt. -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by pspzockerscene; 14.07.2023 at 10:07. Reason: EDIT |
#7
|
|||
|
|||
debug-log:
14.07.23 15.13.26 <--> 14.07.23 15.15.22 jdlog://5546311370661/ 14.07.23 15.37.56 <--> 14.07.23 15.50.04 jdlog://6546311370661/ There are 4 videos on 2023-07-12 and 100+ more if I do a full crawl on **External links are only visible to Support Staff****External links are only visible to Support Staff**. If you got only 2 result, I think there's a problem. This is the step that I've done. - delete 1 link that already crawled name "2023-07-12.." - crawl with max_date=2023-07-14 , max_date=2023-07-01, max_date=2023-06-30, max_date=2023-06-20 all return no result. - crawl with max_date=2023-06-15, max_date=2023-06-01 return with the link that I deleted. - crawl with maxitems=20 return with the link that I deleted. It's seems like if I set max_date too recent even though it's older it'll return no result. I tried the above step incase I'm wrong about how "max_date" works. This is what I understand. Present----Max_date----When that user was made If I use max_date, it'll crawl start from the newest tweet at present and back to the max_date. Because I set it at 2023-06-01 and got a result of 2023-07-12. Btw I'm the one that requested this function with you to put less load on twitter server when crawling for update but I only use maxitems before. Last edited by JTK; 14.07.2023 at 12:20. |
#8
|
||||
|
||||
Quote:
That is correct. Twitter is sorting those items from newest to oldest by default and this is the order they're crawled. I still don't understand your problem. For example crawling your profile with "max_date=2023-06-01" gets me 21 results. I can't see a mistake here. I remember.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#9
|
|||
|
|||
If I set max_date to older date (I think around a month old), it's no problem.
The problem is if I set the max_date to 2023-07-10 I should get all the tweet from present to that day including 4 videos on 2023-07-12 but some or all didn't show up as you tried above and it's only show 2 result instead of 4+result. If I want "ALL" tweet on date 2023-07-12, I have to set max_date way back to 2023-06-01 or use "maxitems" instead. |
#10
|
||||
|
||||
I think now I understand what you mean.
I'll look deeper into that topic once I find the time.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#11
|
||||
|
||||
I found the [simple] cause of this:
The twitter API is delivering the tweets of each pagination-page unsorted but the handling in our plugin expects them to be sorted thus once one item is "older than allowed", the current handling jumps out so this is how the missing items can happen. I'll let you know once there is a fix available.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download Last edited by raztoki; 16.07.2023 at 01:51. |
#12
|
|||
|
|||
Thanks for the update. Weird that they delivering them unsorted.
|
#13
|
||||
|
||||
CORE-Updates have been released!
All announced bugfixes and features are live! Please update your JDownloader and report any issues you find asap. If this thread gets marked as "[Solved]" by our forum staff you can still post in it and we will read- and reply to it! CORE-Updates wurden released! Alle angekündigten Bugfixes/Features sind nun verfügbar! Bitte JDownloader updaten und eventuelle Bugs schnellstmöglich an uns melden. Falls dieser Thread vom Team als "[Erledigt]" markiert wird, kannst du weiterhin darin antworten und wir lesen/beantworten auch solche Threads! -psp-
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
Thread Tools | |
Display Modes | |
|
|