JDownloader Community - Appwork GmbH
 

Reply
 
Thread Tools Display Modes
  #1  
Old 04.01.2018, 18:36
famuex
Guest
 
Posts: n/a
Default Twitter crawler misses some tweets.

Hello!

Recently I noticed that the Twitter crawler missed some videos and images for certain users.

After reviewing the source code and doing some tests, it turns out that this problem might be caused by the following regular expression: (in decrypter/TwitterCom.java line 228)

Code:
"li class=\"js\\-stream\\-item stream\\-item stream\\-item([^の]+?)ProfileTweet\\-actionCount"
which is used by the Twitter decrypter plugin to extract tweet entries. The intention of "([^の]+?)" part seems to be matching and capturing any HTML in each tweet. But since "の" is a common Japanese character, it eventually results in incorrectly missing the tweets which contain "の" in its description.

This problem can be reproduced on most Japanese twitters,
e.g. **External links are only visible to Support Staff****External links are only visible to Support Staff**

I tried fixing this problem by changing the "([^の]+?)" into "(.+?)" and by far it seems to work as expected.

Sorry for my poor English and thanks for your reading and help.
Reply With Quote
  #2  
Old 24.01.2018, 14:28
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 79,532
Default

Thanks for the bugreport. I've updated plugin according to your suggestion.
In case you wish source repo write account, please let us know at support@jdownloader.org
__________________
JD-Dev & Server-Admin
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 08:39.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.