Thread: [User feedback required] Twitter - Scrape date/time limit?
View Single Post
  #10  
Old 14.04.2020, 08:09
Akasen Akasen is offline
Super Loader
 
Join Date: Jul 2017
Posts: 26
Default

Hi, I thought I'd chime in here since this thread is alive still and I essentially got the answer to this problem from PSP in another thread

Quote:
Originally Posted by pspzockerscene View Post
Hi again Akasen,

I've investigated this.

It seems like twitter has a max. number of tweets you can see/go back.
Code:
641|twitter.com_jd.plugins.decrypter.TwitterCom 31.03.20 15:36:59 - INFO [ jd.plugins.decrypter.TwitterCom(crawlUserViaAPI) ] -> Numberof tweets on current page: 0 of expected max 20
641|twitter.com_jd.plugins.decrypter.TwitterCom 31.03.20 15:37:01 - INFO [ jd.plugins.decrypter.TwitterCom(crawlUserViaAPI) ] -> Numberof total tweets crawled: 829 of expected total 2748
By default, twitter will return 20 tweets per page --> Every tweet may contain a different amount of downloadable media.

Basically we noticed this in the past as well.
You can even check this via browser by going back as far as 41 "pages" which means 40x reloading by scrolling down.

The issue is also that a lot of websites would allow to e.g. "start at position 500".
Twitter however uses so called "cursors" which means to access the next page, you will have to get a token which is only available on the previous page so even if I wanted, I would not be able to give you any options to e.g. start at position 800 in this case.

For your other URL, it finds 200 objects which should be all as it only contains 162 tweets.

I could now e.g. experiment and display more objects per page but although thiy may return some more objects, I we would run into similar issues with URLs containing even more objects.

I recommend you to:
- Test via browser and see how far you can get and if you can e.g. get more than JD does
- Search the Internet for other Twitter downloader tools --> If you find one that does a better job than our crawler, let me know and I'll look into it again

-psp-
My experience so far in reporting these issues to PSP in another thread has been that of "steps forward, some steps back" with regards to Twitter. The API is just weird and Twitter is likely constantly working against the efforts of things like Jdownloader

Quote:
Originally Posted by pspzockerscene View Post
Hm as said, I get 200 items when I add that one of your two URLs.
Now I even get 201.

According to the github tickets of this other software, the twitter API is kinda random.

Unfortunately I do not have the time to do big experiments on it and it seems like it is working fine for most of all of our current users so I do not want to add experimental code.

According to the tickets, changing the "filter" values and also the User-Agent may bring more results.

We are open source so if you want you can grab our code and play around with it:
**External links are only visible to Support Staff**...

-psp-
The best thing that probably can be done at this rate is for a large group of people interested in maintaining the Twitter plugin and gaining insight into it to take up the code and start documenting and experimenting with the plugin. I have the jdownloader code downloaded myself, but I'm not able to focus entirely on figuring out and experimenting with Jdownloader and Twitter.

Last edited by Akasen; 14.04.2020 at 08:14.
Reply With Quote