JDownloader Community - Appwork GmbH
 

 
 
Thread Tools Display Modes
  #1  
Old 01.12.2009, 07:33
rafi
Guest
 
Posts: n/a
Default improve ID of similar mirrored-downloads-parts in linkgrabber

**External links are only visible to Support Staff****External links are only visible to Support Staff**
***************

look at the example link. Many cases of nonidentical mirrored-parts-links are packaged together, or sorted badly insde packages.
I suggest to give an option to the user to specify the criteria for matching links:
1. name (link/file)
2. parsed part#
3. size

and and/or operations on the above.

At least - mark the package status as partialy matched

Last edited by rafi; 03.12.2009 at 07:03. Reason: syntax errors & clarifications
  #2  
Old 01.12.2009, 10:45
remi
Guest
 
Posts: n/a
Cool

I think the Linkgrabber already does a good job in trying to package links that seem to belong together.

I guess you want a sort of link grouping filter?
  #3  
Old 01.12.2009, 12:40
rafi
Guest
 
Posts: n/a
Default

I opened the link for you, so you can try it yourself. I'm referring to the 12G package, that (I think) needs this new cretiria (file sizes, parts notation etc).

I don't mind this manual management for a single cases, but when I think about this in relation to RSS in the future, it might need more automation and those parameters.

Last edited by rafi; 01.12.2009 at 12:45.
  #4  
Old 01.12.2009, 13:11
remi
Guest
 
Posts: n/a
Cool

You're very forward looking with your RSS feed.

I see a 13.17 GB package in my version of jD. In that package there are off-line links as well. The files all have the same name except for the partN(N) and rNN strings. I think that if the structure of the link name is different, links should be separated in different packages.
  #5  
Old 01.12.2009, 14:38
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,991
Default

i checked and its working correct. there is no way to differ between:

-part01 and part1 because both have part10
-the part ones and the rar ones

the autopackager is for packing all files that belong together into one package and jd does that job good and correct in our case.

using part is no good because:
01,1 have both 10

using size is not possible because:
every side can report different filesizes, some dont report any at all
the real filesize is only available via api or when download has begun
__________________
JD-Dev & Server-Admin
  #6  
Old 01.12.2009, 18:32
rafi
Guest
 
Posts: n/a
Default

I think we both are looking at different things since I 'm on the stable version - 0.9.597. Her is what I see. If you see something else (since you fixed stuff) - I'll be very glad to see that too (I'm trying not to update ...)
Sizes seem to be reported quite nicely here. Also I am not sure I understood what you meant by "01,1 have both 10 " - part 10 ? "My" package is a complete mix and undownloadable... I hope it's because I have an earlier JD release...

And yes, remi, I'm hoping for an RSS downloader one day in the future... I expect that page links like that will be the maximum it can get for JD, so only if JD will make a good job parsing it - it can also auto-continue with it...

http://img37.imageshack.us/i/24823382.gif/
continued here:
http://img37.imageshack.us/i/67887712.gif/
  #7  
Old 01.12.2009, 18:39
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,991
Default

jd cannot differ between archives because
xy.part01.rar
and
xy.part1.rar

have BOTH xy.part10.rar as part 10

for sizes: each website reports filesizes differently (some in byte, some in mb, some in gb, some not at all)
only values in byte are exact (often given by apis)

any other filesize is not accurate enough to use it for something usefull
because 0.1GB and 1000MB are not the same!
__________________
JD-Dev & Server-Admin
  #8  
Old 01.12.2009, 20:25
rafi
Guest
 
Posts: n/a
Default

DLC: **External links are only visible to Support Staff****External links are only visible to Support Staff** just to be in sync...


for some strange reason, this does not bother me... they both have part10, so what ?... Or I miss something ... :(

I'll try to see how I can make the software "choose" only one to continue with ...

I understand you merge by file name , right ?

Ok, I understand you like to be exact on size, but in life you compromise... So if you are willing to do that, see here my "brain-soft" algorithm...

In order to achieve continuable packages, I will go also by these rules:
1. do not merge groups of links with different TOTAL size
2. do not merge groups with different # of parts
3. do not merge groups with parsed parts # greater then the actual # of parts (parsing or other error)
4. do not merge were a single part size is different (even if the total is the same)
(note: all those "rules" can be an option for the user to give, as well as IF he likes to auto-continue )

So in this example, if we go try and merge only the 6 parts groups :
- links with part # "720"are out
- 12 part group is out
- 2X 4 parts groups ( *.rxx ) are out
- 1X 3 parts group (400+400+319M) is out

We are left with 10X 6 parts groups..
**External links are only visible to Support Staff****External links are only visible to Support Staff**


per rule 4 - they are divided to
- 7 groups with 200*5+165 => package 1
- 2 groups with 190*5+118 => package 2

Now, those I can work/continue with...
So If the user would put a filter - say for *720P* named links, the scan will give 2 more packages.

And now we can say - we continue with the one with the largest # of files/mirrors...

The BIG question is - can you try make it so that the software will do this ?

Last edited by coalado; 01.12.2009 at 23:58.
  #9  
Old 01.12.2009, 20:33
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,991
Default

with the part i mean

test.part1.rar
test.part2.rar
test.part10.rar

test.part01.rar
test.part02.rar
test.part10.rar

you see there is no way to divide this, because after part 10 you have the same filename/part.

your rule 2 and 3 are these
__________________
JD-Dev & Server-Admin

Last edited by Jiaz; 01.12.2009 at 20:36.
  #10  
Old 01.12.2009, 20:34
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,991
Default

you speak of merge groups, thats what the autopackager does! a package is a group. there is nothing between.
__________________
JD-Dev & Server-Admin
  #11  
Old 01.12.2009, 20:39
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,991
Default

btw, the changes you want are more complex than you think
for example

link.part01.rar
link.part01.r00

or
link.rar.001
link.rar.001

or
link.rar.001
link.r01.001

or
link.part01.rar.001
link.part01.rar.002

and so on
as there is no default rule how to name files its not easy to autopackage them

at the moment we simply remove any partinformation, archive ending and other extensions and then do best-match search
__________________
JD-Dev & Server-Admin
  #12  
Old 01.12.2009, 20:58
rafi
Guest
 
Posts: n/a
Default

I didn't say it was easy, just doable... But I'm trying to figure out what made it "easier" for me, and more difficult for you...

First thing that comes to mind is that you might be loosing relevant information when you start processing the web page. The web page (in this general (!) case) includes important 'hints' - what I called the original "groups" of links (=links not separated by any character except eol.). Do you actually save info on those at all, internally ? My rules referred to those groups.
  #13  
Old 01.12.2009, 21:04
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,991
Default

there is no way to parse any rules because there is no standard how links are posted on pages. thats not possible in http work because you have crypted links, shortend, urls, redirected links, forums, posts, comments, webpages, lists, text, and many many more.
__________________
JD-Dev & Server-Admin
  #14  
Old 01.12.2009, 21:22
rafi
Guest
 
Posts: n/a
Default

I see. So, is it possible to, at least, preserve the order/sequesnce #) the links appear in the page ? and maybe the "distance" (=# of characters) to the next link ?

If you can just save those 2 numbers per each link ("**External links are only visible to Support Staff**) you find, we might be able to think of how to benefit from them later on...

Last edited by rafi; 01.12.2009 at 21:52.
  #15  
Old 01.12.2009, 23:52
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,991
Default

dont take me wrong but there is no advantage in doing this.
1.) it would be alot of changes, clipboard wathcing, linkgrabber, linkparser and so on
2.) each website does this on complete different way: thats what im trying to say here, its not possible to get it working perfectly because each website posts link in different way and each uploader names their links differently. i can easily say over 30 ways how to post links
3.) not that really big advantage at all

im sorry but i dont see any reason in putting any time in such a code/idea hole.
__________________
JD-Dev & Server-Admin
  #16  
Old 02.12.2009, 07:41
rafi
Guest
 
Posts: n/a
Default

I see. Well, the question is :
1. do you think that somehow JD will be able at some point to do a auto grab+continue ending with for a single package downloading ? I think this option should be available in JD also now, for a quick & simple manual copy + paste + download of links. But I'm thinking more of a whole page in the future. The user will have then, to "assist" it in giving a proper filter. (I'm thinking on an RSS downloader, grabbing a whole page).

And a small suggestion/request that will surely is quick to implement now:
2. can you just save one number per every link you parse - it's position (=counted number of characters from the start) in the page or selected area the user copied, and make a column of those numbers in the linkgrabber (maybe in the download page as well) ? it can be invisibly by default. I believe the user can, then, sort a package by this column, and be able to reconstruct at any time the order of the links in the source-page.

Just to make sure my previous post was understood, my idea was - that this 'position-number' can greatly assist JD logic in proper grouping of links inside packages. I understand that you think it's a big IF, but why not consider it a "debug" column and just try it out and see ? ...:)

Last edited by rafi; 02.12.2009 at 07:48.
  #17  
Old 02.12.2009, 11:24
remi
Guest
 
Posts: n/a
Cool

When I was reading this thread, I came to the same conclusion. Most links that belong together usually are listed one after the other.

The Linkgrabber should assign a sequence number to any link grabbed. The customer should then be able to sort on this sequence number and repackage the links more intelligently in cases where the Linkgrabber can't do it well.
  #18  
Old 02.12.2009, 13:16
Jiaz's Avatar
Jiaz Jiaz is offline
JD Manager
 
Join Date: Mar 2009
Location: Germany
Posts: 80,991
Default

no such a change will not come because
1.) need alot of changes in many parts of jd
2.) not really an advantage because
2.1) the distances are not equal, i mean:
2.2.) between part1.rar and part2.rar can be only 2 lins html, but part2 and part3 100lines. there can be css, comments, html tags , ads, and many many more.
2.3.) what then? then you have link1 begins at 100, link2 at 230 ,link3 at 412, link4 at 420....thats is not unusal but on browser itself all links can be in one row

there is no sense in changing so many parts for a *debug, invisible* column that has nearly nothing to say because every html looks different
__________________
JD-Dev & Server-Admin
  #19  
Old 02.12.2009, 18:50
rafi
Guest
 
Posts: n/a
Default

I (and maybe remi too) am not aware of the related workload for you guys.
We are just proposing something that may have the potential to be very useful. To me/us, it seems easy that while you scan & grab a link from a page-link you just count and keep the position - # of chars from the start of page - with it. Same thing if it's from the scratch-pad - from the start of scratch-pad.


And when I said - hidden - I meant as the default in the first release, so it can be tested by fewer people first and if useful - you set the default for it to be "visible".

As I tried to explain - the benefit is the relative position of the links. With this, you can ALWAYS sort by this column and JD might be better identifying the relevant "groups" inside a package. What is located in-between the groups/links does not effect the logic it in any way... It might just help it later on to separate between mirrored-groups.



Me, I still hope that in many cases - a single group will be evenly spaced... so those space can also be used later on to identify the group's' bounderies...

Last edited by rafi; 03.12.2009 at 07:58.
  #20  
Old 03.12.2009, 07:55
rafi
Guest
 
Posts: n/a
Default

not to forget the already suggested #-of-parts & 'size' (when possible) as a match "group" criteria too...

And a final note - I am no big expert in html, but I do not remember many occasions when a single user-post (and definitely a user-posted "group" of links) - were not continuous (at least - looking continous...)

Last edited by rafi; 03.12.2009 at 08:01.
  #21  
Old 03.12.2009, 10:12
remi
Guest
 
Posts: n/a
Cool

I accept that the current design of the Linkgrabber makes it nearly impossible to add this feature.

What I suggested was just adding a sequence number (not a distance metric) so that :-

A.part01.rar
A.part02.rar
A.part03.rar

a.part01.rar
a.part02.rar
a.part03.rar

on a web page becomes :-

001; A.part01.rar
002; A.part02.rar
003; A.part03.rar
004; a.part01.rar
005; a.part02.rar
006; a.part03.rar

In the Linkgrabber.

I've never seen a website where the above links would be posted like :-

A.part01.rar
a.part01.rar

A.part02.rar
a.part02.rar

A.part03.rar
a.part03.rar

because in this case, the feature would be useless.

Let's hope the feature will be included when the Linkgrabber is redesigned in 2010 or 2011.
  #22  
Old 03.12.2009, 10:27
rafi
Guest
 
Posts: n/a
Default

eh, don't be such a pasimist...
1. I think sequence #s or position #s have about the same implementation difficulty
2. It's feasible in the current design (just not-liked by the devs...)
3. the advantage of position # - is that it's the complete info. Sequence numbers are DERIVED from it. You can think of utilizing those for more features in the future...

and one example for future use: I speculate that we'll find it easy to sometimes also separate between groups by the "distance" between the links...

Last edited by rafi; 03.12.2009 at 10:31.
 

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT +2. The time now is 22:58.
Provided By AppWork GmbH | Privacy | Imprint
Parts of the Design are used from Kirsch designed by Andrew & Austin
Powered by vBulletin® Version 3.8.10 Beta 1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.