#1
|
|||
|
|||
improve ID of similar mirrored-downloads-parts in linkgrabber
**External links are only visible to Support Staff****External links are only visible to Support Staff**
*************** look at the example link. Many cases of nonidentical mirrored-parts-links are packaged together, or sorted badly insde packages. I suggest to give an option to the user to specify the criteria for matching links: 1. name (link/file) 2. parsed part# 3. size and and/or operations on the above. At least - mark the package status as partialy matched Last edited by rafi; 03.12.2009 at 07:03. Reason: syntax errors & clarifications |
#2
|
|||
|
|||
I think the Linkgrabber already does a good job in trying to package links that seem to belong together.
I guess you want a sort of link grouping filter? |
#3
|
|||
|
|||
I opened the link for you, so you can try it yourself. I'm referring to the 12G package, that (I think) needs this new cretiria (file sizes, parts notation etc).
I don't mind this manual management for a single cases, but when I think about this in relation to RSS in the future, it might need more automation and those parameters. Last edited by rafi; 01.12.2009 at 12:45. |
#4
|
|||
|
|||
You're very forward looking with your RSS feed.
I see a 13.17 GB package in my version of jD. In that package there are off-line links as well. The files all have the same name except for the partN(N) and rNN strings. I think that if the structure of the link name is different, links should be separated in different packages. |
#5
|
||||
|
||||
i checked and its working correct. there is no way to differ between:
-part01 and part1 because both have part10 -the part ones and the rar ones the autopackager is for packing all files that belong together into one package and jd does that job good and correct in our case. using part is no good because: 01,1 have both 10 using size is not possible because: every side can report different filesizes, some dont report any at all the real filesize is only available via api or when download has begun
__________________
JD-Dev & Server-Admin |
#6
|
|||
|
|||
I think we both are looking at different things since I 'm on the stable version - 0.9.597. Her is what I see. If you see something else (since you fixed stuff) - I'll be very glad to see that too (I'm trying not to update ...)
Sizes seem to be reported quite nicely here. Also I am not sure I understood what you meant by "01,1 have both 10 " - part 10 ? "My" package is a complete mix and undownloadable... I hope it's because I have an earlier JD release... And yes, remi, I'm hoping for an RSS downloader one day in the future... I expect that page links like that will be the maximum it can get for JD, so only if JD will make a good job parsing it - it can also auto-continue with it... http://img37.imageshack.us/i/24823382.gif/ continued here: http://img37.imageshack.us/i/67887712.gif/ |
#7
|
||||
|
||||
jd cannot differ between archives because
xy.part01.rar and xy.part1.rar have BOTH xy.part10.rar as part 10 for sizes: each website reports filesizes differently (some in byte, some in mb, some in gb, some not at all) only values in byte are exact (often given by apis) any other filesize is not accurate enough to use it for something usefull because 0.1GB and 1000MB are not the same!
__________________
JD-Dev & Server-Admin |
#8
|
|||
|
|||
DLC: **External links are only visible to Support Staff****External links are only visible to Support Staff** just to be in sync...
for some strange reason, this does not bother me... they both have part10, so what ?... Or I miss something ... :( I'll try to see how I can make the software "choose" only one to continue with ... I understand you merge by file name , right ? Ok, I understand you like to be exact on size, but in life you compromise... So if you are willing to do that, see here my "brain-soft" algorithm... In order to achieve continuable packages, I will go also by these rules: 1. do not merge groups of links with different TOTAL size 2. do not merge groups with different # of parts 3. do not merge groups with parsed parts # greater then the actual # of parts (parsing or other error) 4. do not merge were a single part size is different (even if the total is the same) (note: all those "rules" can be an option for the user to give, as well as IF he likes to auto-continue ) So in this example, if we go try and merge only the 6 parts groups : - links with part # "720"are out - 12 part group is out - 2X 4 parts groups ( *.rxx ) are out - 1X 3 parts group (400+400+319M) is out We are left with 10X 6 parts groups.. **External links are only visible to Support Staff****External links are only visible to Support Staff** per rule 4 - they are divided to - 7 groups with 200*5+165 => package 1 - 2 groups with 190*5+118 => package 2 Now, those I can work/continue with... So If the user would put a filter - say for *720P* named links, the scan will give 2 more packages. And now we can say - we continue with the one with the largest # of files/mirrors... The BIG question is - can you try make it so that the software will do this ? Last edited by coalado; 01.12.2009 at 23:58. |
#9
|
||||
|
||||
with the part i mean
test.part1.rar test.part2.rar test.part10.rar test.part01.rar test.part02.rar test.part10.rar you see there is no way to divide this, because after part 10 you have the same filename/part. your rule 2 and 3 are these
__________________
JD-Dev & Server-Admin Last edited by Jiaz; 01.12.2009 at 20:36. |
#10
|
||||
|
||||
you speak of merge groups, thats what the autopackager does! a package is a group. there is nothing between.
__________________
JD-Dev & Server-Admin |
#11
|
||||
|
||||
btw, the changes you want are more complex than you think
for example link.part01.rar link.part01.r00 or link.rar.001 link.rar.001 or link.rar.001 link.r01.001 or link.part01.rar.001 link.part01.rar.002 and so on as there is no default rule how to name files its not easy to autopackage them at the moment we simply remove any partinformation, archive ending and other extensions and then do best-match search
__________________
JD-Dev & Server-Admin |
#12
|
|||
|
|||
I didn't say it was easy, just doable... But I'm trying to figure out what made it "easier" for me, and more difficult for you...
First thing that comes to mind is that you might be loosing relevant information when you start processing the web page. The web page (in this general (!) case) includes important 'hints' - what I called the original "groups" of links (=links not separated by any character except eol.). Do you actually save info on those at all, internally ? My rules referred to those groups. |
#13
|
||||
|
||||
there is no way to parse any rules because there is no standard how links are posted on pages. thats not possible in http work because you have crypted links, shortend, urls, redirected links, forums, posts, comments, webpages, lists, text, and many many more.
__________________
JD-Dev & Server-Admin |
#14
|
|||
|
|||
I see. So, is it possible to, at least, preserve the order/sequesnce #) the links appear in the page ? and maybe the "distance" (=# of characters) to the next link ?
If you can just save those 2 numbers per each link ("**External links are only visible to Support Staff**) you find, we might be able to think of how to benefit from them later on... Last edited by rafi; 01.12.2009 at 21:52. |
#15
|
||||
|
||||
dont take me wrong but there is no advantage in doing this.
1.) it would be alot of changes, clipboard wathcing, linkgrabber, linkparser and so on 2.) each website does this on complete different way: thats what im trying to say here, its not possible to get it working perfectly because each website posts link in different way and each uploader names their links differently. i can easily say over 30 ways how to post links 3.) not that really big advantage at all im sorry but i dont see any reason in putting any time in such a code/idea hole.
__________________
JD-Dev & Server-Admin |
#16
|
|||
|
|||
I see. Well, the question is :
1. do you think that somehow JD will be able at some point to do a auto grab+continue ending with for a single package downloading ? I think this option should be available in JD also now, for a quick & simple manual copy + paste + download of links. But I'm thinking more of a whole page in the future. The user will have then, to "assist" it in giving a proper filter. (I'm thinking on an RSS downloader, grabbing a whole page). And a small suggestion/request that will surely is quick to implement now: 2. can you just save one number per every link you parse - it's position (=counted number of characters from the start) in the page or selected area the user copied, and make a column of those numbers in the linkgrabber (maybe in the download page as well) ? it can be invisibly by default. I believe the user can, then, sort a package by this column, and be able to reconstruct at any time the order of the links in the source-page. Just to make sure my previous post was understood, my idea was - that this 'position-number' can greatly assist JD logic in proper grouping of links inside packages. I understand that you think it's a big IF, but why not consider it a "debug" column and just try it out and see ? ...:) Last edited by rafi; 02.12.2009 at 07:48. |
#17
|
|||
|
|||
When I was reading this thread, I came to the same conclusion. Most links that belong together usually are listed one after the other.
The Linkgrabber should assign a sequence number to any link grabbed. The customer should then be able to sort on this sequence number and repackage the links more intelligently in cases where the Linkgrabber can't do it well. |
#18
|
||||
|
||||
no such a change will not come because
1.) need alot of changes in many parts of jd 2.) not really an advantage because 2.1) the distances are not equal, i mean: 2.2.) between part1.rar and part2.rar can be only 2 lins html, but part2 and part3 100lines. there can be css, comments, html tags , ads, and many many more. 2.3.) what then? then you have link1 begins at 100, link2 at 230 ,link3 at 412, link4 at 420....thats is not unusal but on browser itself all links can be in one row there is no sense in changing so many parts for a *debug, invisible* column that has nearly nothing to say because every html looks different
__________________
JD-Dev & Server-Admin |
#19
|
|||
|
|||
I (and maybe remi too) am not aware of the related workload for you guys.
We are just proposing something that may have the potential to be very useful. To me/us, it seems easy that while you scan & grab a link from a page-link you just count and keep the position - # of chars from the start of page - with it. Same thing if it's from the scratch-pad - from the start of scratch-pad. And when I said - hidden - I meant as the default in the first release, so it can be tested by fewer people first and if useful - you set the default for it to be "visible". As I tried to explain - the benefit is the relative position of the links. With this, you can ALWAYS sort by this column and JD might be better identifying the relevant "groups" inside a package. What is located in-between the groups/links does not effect the logic it in any way... It might just help it later on to separate between mirrored-groups. Me, I still hope that in many cases - a single group will be evenly spaced... so those space can also be used later on to identify the group's' bounderies... Last edited by rafi; 03.12.2009 at 07:58. |
#20
|
|||
|
|||
not to forget the already suggested #-of-parts & 'size' (when possible) as a match "group" criteria too...
And a final note - I am no big expert in html, but I do not remember many occasions when a single user-post (and definitely a user-posted "group" of links) - were not continuous (at least - looking continous...) Last edited by rafi; 03.12.2009 at 08:01. |
#21
|
|||
|
|||
I accept that the current design of the Linkgrabber makes it nearly impossible to add this feature.
What I suggested was just adding a sequence number (not a distance metric) so that :- A.part01.rar A.part02.rar A.part03.rar a.part01.rar a.part02.rar a.part03.rar on a web page becomes :- 001; A.part01.rar 002; A.part02.rar 003; A.part03.rar 004; a.part01.rar 005; a.part02.rar 006; a.part03.rar In the Linkgrabber. I've never seen a website where the above links would be posted like :- A.part01.rar a.part01.rar A.part02.rar a.part02.rar A.part03.rar a.part03.rar because in this case, the feature would be useless. Let's hope the feature will be included when the Linkgrabber is redesigned in 2010 or 2011. |
#22
|
|||
|
|||
eh, don't be such a pasimist...
1. I think sequence #s or position #s have about the same implementation difficulty 2. It's feasible in the current design (just not-liked by the devs...) 3. the advantage of position # - is that it's the complete info. Sequence numbers are DERIVED from it. You can think of utilizing those for more features in the future... and one example for future use: I speculate that we'll find it easy to sometimes also separate between groups by the "distance" between the links... Last edited by rafi; 03.12.2009 at 10:31. |
Thread Tools | |
Display Modes | |
|
|