[In progress] Suggested New Download control & UI - Page 2 - JDownloader Community

#41 06.12.2009, 16:29

I didn't have time to read all the remarks, but I have a feeling it's getting too complex. We should focus on the user perspective/states/commands related to the SEPARATE tasks control (as mentioned), and less on JD internal states.
As for the tool - for me the internal "word" graphics were good enough too...

As I said, I think a list of use-cases (that I neglected to give) in my spec would have helped a bit too ...

#42 07.12.2009, 14:00

It is complex. What we try to do is to clarify issues that aren't clear.

The customer must have the feeling that the software is easy to use. Therefore (s)he needs to understand at any time what's happening with the downloads. That's why we need to decide which process states (translated to statuses for the customer) are important for the customer.

We don't need to concentrate on detailed design issues, because that's the developers' responsibility.

#43 07.12.2009, 15:39

I do not know if i have understood it right.
Does the user have the choice to organize his downloads by sorting the downloads as you can do it in µtorrent?
Can this wish be compared with mine? (http://board.jdownloader.org/showthread.php?t=10999)
There I asked for sorting all disabled downloads out into another window.

Regards,

BW Z06

#44 07.12.2009, 15:48

well, your wish is similar in a way that it asks to categories (filter) the main view. Yes, the disabled ones can fall into one of the categories when we'll refine it, you will (hopefully) be able to see all w/o them or only them if we will make them a possible "category" . I don't think you can see "stopped" ones in uTorrent though...

so, please vote "yes" for #3...

#45 07.12.2009, 15:59

I already did it;)
Thanks and good luck. Some new categories are welcome;)

Regards,

BW Z06

drbits · #46 08.12.2009, 07:26

See Green comments inserted into your post.

Quote:

Originally Posted by remi

Thanks for reviewing the diagram and making recommendations.

My state diagram only shows some of the important process states (called Status in jD) and state transitions (commands or menu items in jD) from the customer's perspective.

1) Good proposition. jD would look like a dashboard of an Airbus.

I meant in the properties, not in the tree. Besides, I like being in control, so I need lots of flashing lights, switches, and dials

2) The distinction between waiting in the queue or waiting for a connection are IMO not that important for the customer. The customer already sees a count down for the link. Though they are important for the internal design of jD.

All links in the waiting state are waiting for a host. Connecting is not that important for the customer, because it only takes a few seconds. If it takes longer then you have a connection problem, hence a temporary error. That's why I never see "connecting" unless I have connection problems. Note that my jD window is minimised when I don't interact with jD.

Waiting for a new IP address is another waiting state that shouldn't take long. I've no experience with it, but I can imagine that it can take long enough to make a customer worried. I can try adding it to the diagram. It is a transitory state between waiting and downloading. It can also lead to a reconnection problem which is a special case of customer input or jD bug, depending on the technical proficiency of the customer.

The enqueued state might be a complex state containing waiting, disabled, downloading, paused and resumable states. I don't see the added value from the customer's perspective.
I agree with you on most points.
However, waiting for a new IP address can take hours. The problem is that JD cannot change the IP address if there are non-resumable links downloading. It has to wait until they complete (or the user kills them off). In fact, this kind of waiting is mostly a slight of hand to speed up the downloading (avoid waits and quotas) and is not part of JDs central functionality. Just leave it as part of waiting.

A connection usually takes about 2-5 seconds, but Captcha solution can take longer. Also, by IETF rules, a server can wait over a minute before responding to an HTTP connection request.

3) I don't fully understand this remark.
If one has over 100 packages, it is easier to work with JD if the packages that have started downloading are at the top of the list. The currently downloading packages will be up there. All of the packages with errors will tend to be up there as well.

I also tend to leave JD minimized most of the time. So, when I want to check on it, I don't want to have to scroll through all my packages. My comment was that if we have categories, we can change the default display to ignore the packages that have not started downloading yet.

4) I don't fully understand what you're aiming at here.
This is just a description of my (admittedly unusual) JD use. As a Link Hog, I have almost 2000 packages, 25000 links, and I have 2.1 TB of data waiting to download. The other extreme is people who only have one or two packages at a time in JD.

5) & 6) 'removed' is in the state diagram because it might belong to the future link history database of jD. When this feature is implemented it will become visible for the customer. The removed state represents deleted files.

The dequeued transitions represent the deletion of links. My assumption is that the customer isn't interested in keeping them in a history.

Categories, or what I understand from the concept, are not related to process state. I don't see why or how I would add that to a state diagram.
Think of deleting a link as putting it in the trash can, not removing it from the system. You could remove it from the database (what you are calling dequeue) by removing it from the trash can (or emptying the trash). In order to separate the status and the fact that the link is in the trash can, I suggested using a category -- it is really just a flag.

The original for keeping removed links around is so that they are available for the duplicate detection. If we ever add the referrer to each link, we would then have a history of where we have obtained downloads. This would be as useful as the history in a browser. It would also allow determination of sources for bad links.

For example, we may need to distinguish between links that have been downloaded and the file is there, links that have been downloaded and the file is not where we left it, links in the process of being downloaded (with their statistics), and the various link states you have identified. Whether the link is in the trash, is a separate item - it is not a state.

7) I want to represent the different states of the downloading of one download unit, i.e., a link, a package or the entire queue. Some states might be called slightly different when packages or the entire queue (the state of jD) are concerned. See also point 9)
I agree. Much of what I said is not important to what you are doing.

I guess what I was trying to get at is there are stages that a link or package go through Preprocessing, waiting to download, downloading and postprocessing. Except for errors, there is only one link out of each of these. These are your big pieces and you might see if this can help you reduce the complexity of your state diagram.

Some details: Preprocessing (prepare, acquisition, filtering, information retrieval, collection into larger units, preprocessing completion), Waiting to download, Downloading (a lot of steps here), Postprocessing (join, extract, validate, postprocess completion), Finished.

The user doesn't have to know about the postprocessing steps individually (no major distinction between Join and Extract for a user, or the fact that postprocessing contains at least two validation steps, updates the database, and updates the display). That can just be a "Postprocessing" link (or box). The same is true of some other areas.

I'll add the pre-queued states that represent those covered by the Linkgrabber, because they are necessary for our full understanding of jD's entire download process. I'll probably call the first state 'created'. 'verified' and 'decrypted' will probably be added as well. I'll think about your other suggestions as I'm not sure they are that important from the customer's viewpoint.

I won't add sub-states that aren't important for the customer. That would be the responsibility for the developers who perform the detailed design of jD. The 'complex' states I added just group states for clarification purposes. They aren't real states IMO.
I would just concentrate on downloading. Represent the preprocessing and postprocessing as just single items you can deal with later.

8) I don't see why Reset would be a complex command for the customer. It simply deletes the file(s) concerning the download unit and puts the unit in the waiting state.
I agree. I was not viewing things correctly. Again, this is one link, no need to elaborate on the steps in the diagram.

9) You're right. The join and extract actions are triggered by the downloaded event of the last link belonging to a download set. I regret the concept of "download set" or whatever you call it, has not been provided in jD. The package concept is rather fuzzy and has nothing to do with download sets. A customer will soon learn that as he/she can add or remove any link in a package without jD warning about the consequences of these actions. Worse, the customer can change the "save path" of these packages without any warning from jD. These actions break the extract and join features because they depend on the concept of indivisible "download set".
This is something to add as a requirement and needs to go into the bugtracker. The packages are more like categories, they are user convenience. A download set represents pieces needed to create a final file or collection. For example, some restrictions should be enforced based on whether any part of the download set has been downloaded (total Loaded > 0).

10) The validate action is implied by the "Download exception ()", "Joining error" and "Extraction error" transitions.

11) Your finished state is represented by the final target (big dot in a circle) state. I don't understand the rest of this point, but that's probably caused by my lack of familiarity with these add-ons. I don't use them because these add-ons have too many limitations. I've implemented my own processes for these steps.
I was just trying to distinguish between download completion and completion of all processing on a unit (link, download set, package).

12) I think the data perspective of jD isn't that complex. I'm not sure this needs a lot of clarification. I'll think deeper about it but you could probably give me hints as well. What do you mean exactly with dataflow?

In a state diagram (or action diagram) the links are the actions (or limitations) and the nodes are the results (and waiting places).
In a dataflow diagram, things are reversed. The nodes represent actions and the links are labeled with the data that is required for that action.
Drawing both diagrams helps prevent missed requirements. You are only working on a high-level specification. In a detailed specification, you would need the subdiagrams, lists of what is in a data object, and a timing diagram (a Petrie net is best, but simplified diagrams are often used).

I'v just taken the first, simple and stupid tool that I found in order to draw this state diagram and it's sufficient for what I need. UMLet can be run as an Eclipse plug-in as well. If there would be any ambiguity in the diagram, please report it. I might make it somewhat bigger horizontally but vertically it already covers my screen and I don't want to scroll.

Thanks again for your valuable input.

One more item (this is the professor talking). At the more detailed level, each action is broken up into (set-up, validate input, decide process, process, validate output, cleanup). Not all of these appear in all cases, but you can simplify your higher level diagram by assuming this is happening at a lower level.

#47 08.12.2009, 13:01

Quote:

Originally Posted by drbits

A connection usually takes about 2-5 seconds, but Captcha solution can take longer.

I agree that this is an important state if the step takes that long.

Quote:

Originally Posted by drbits

...links that have been downloaded and the file is not where we left it...

I don't understand this one.

Quote:

Originally Posted by drbits

The user doesn't have to know about the postprocessing steps individually (no major distinction between Join and Extract for a user

I think Jiaz said that as well. I don't understand it because a downloaded set could extract properly and get stuck on the joining step, because of lack of disk space. A download set could get downloaded but not extract at all because of a missing password or a CRC error in one of the archives. The customer would not be able to distinguish between all these states of the process. (S)he would be lost.

Quote:

Originally Posted by drbits

The packages are more like categories, they are user convenience.

Good observation. I already use the packages as categories for a long time.

Quote:

Originally Posted by drbits

a Petrie net is best, but simplified diagrams are often used

I would prefer specifying processes with Pi-calculus, but then we can forget programming in the old imperative style altogether, because these specifications are executable.

I'll think about the data perspective.

drbits · #48 09.12.2009, 06:19

Note: I know this is long, complex, and preachy. I worked several hours and several drafts of some sections to try to simplify things. I really was a computer science professor, so when I mention the Professor, I mean that part of me that is preachy about complex things.

I would leave out the whole concept of Captcha at the user level, except when we fail and have to ask the user for help. If the JAC works, it is just another tick in the progress.

Quote:

...links that have been downloaded and the file is not where we left it...

This is your Removed case. Once we have completed a download, we don't usually care what happens to the downloaded file (in fact, it is often deleted by unrar). These are completely finished, but the link is not deleted (not just history).

Quote:

I would prefer specifying processes with Pi-calculus,

Professor responding:
It is impossible for a mathematical system that includes the equivalent arithmetic to be complete. This theorem is proven using the same approaches as Turing's work, but using set theory.

Turing proved that there is no way to know if there is a solution to some problems. The theorem above says that sometimes you cannot even state the question.

That is why we try to divide problems up onto manageable independent parts. When specifying a part, we also have to be aware that the human brain can only hold a few ideas in short term memory (between 3 and 9). In order to check ideas for consistency, the both have to be "loaded" into short term memory.

In addition, the parts in a significant problem are not independent. Taken together, the limitations of the human brain and the interconnectedness of real problems means that a single formal view of a problem cannot be a full specification.

Since any real program can be converted into a finite state diagram, for over 40 years, people have been trying to find ways to specify practical problems that can be automatically validated and/or converted into executable code. Every attempt has failed. To do this with a real problem would take an average of o(n^n) time.

If you haven't read Abbot's "Flatland", you should. Basically, it talks about dimensions and that you have to have enough dimensions to see an object, otherwise all you see is an outline of part of the object.

This is why I always recommend:
1) Limit the complexity of any single diagram to around 7 (e.g., 7 states in a diagram). Possibly 3 to 9.
1.a) This forces us to break the problem up into "independent" pieces and specify them separately. For example, subdiagrams for actions or class diagrams for complex data. When we first do this, we ignore errors, etc. Those are interaction problems.
2) Identify what is missing and state it in a different way.
2a) For example, what is missing in a state diagram includes data, constraints, concurrency, and time. Without understanding the data, you cannot understand the constraints. Without both constraints and concurrency, you cannot calculate time. Worse, you cannot determine whether deadlocks or starvation are possible.
3) Develop an arsenal of approaches.
3a) Usually, given two approaches, there are aspects that allow you to specify things you could not specify with just one. Also, there are aspects that allow you to validate (formally or informally) parts of the problem.
4) Errors are handled differently in different representations, but they always represent interactions.
4a) One way to deal with interactions is to represent them separately. Again, one finds ways to divide them into manageable pieces.
4b) For example, in a state diagram, it doesn't matter how many different error messages there are, that is data. However, you are interested in how you can proceed and create a classification for them (e.g., transient, automatically retry-able, out of retries, user assistance needed to retry, and unsolvable at run-time, impossible). Some of these can represent a state. The first two are arrows and require counting. Since you cannot count in a state diagram, you have to add a note or use another technique. You might combine these categories to reduce complexity (out of retries with user assistance needed to retry) and (unsolvable at run-time with impossible). So, the errors actually only require 2 states.

The professor is back: Think about how this affects what you are trying to do.
----------------------
For Postprocessing, if everything works the way it should and the user has not monkeyed with the settings (the way I always do

), what will be left in the specified directory is the file or files that the user wanted to get.

If there is an error, we record all of the data we can in the log and classify the error. The user only needs to know minimal information, but with an ability to get more information or help.

By the way, the Join operation should never run out of disk space. You can always determine the total space necessary to do it the easy way (allocate a file the correct size and copy the contents of each envelope into the output file). If you are very careful, you can do it without any additional disk space.

The expand operation also gives you a maximum amount of space you will need when you look at the index (at most one disk cluster per file for OS overhead plus the sum of the sizes of the files, each rounded up to a multiple of the cluster size). It is when you combine operations that things get sticky. If an archive has to be joined, you would have a rough time figuring out the total space until after the join.
-----------------------------------------------------
EXAMPLE:

In real life, it is not unusual to have nested operations. In this example, the files are archived (RARed) into a multi-part archive, the multi-part archive is re-archived (RARed again) into an archive with bigger parts, and the bigger parts are later split.

The postprocessing is repeated steps of join and extract. Here is an example:
CrUd rips a movie from a BR disk (say 10GB) and reprocesses it into an MKV (say 1.4GB). They add an NFO (or DIZ) file that describes the movie, describes the processing, quality measures, and information about the group. They might also niclude some screen shots as JPG files and a sample clip (AVI or MKV) from the reprocessed file.

Then they use a RAR program to create an archive of all of these files. They have the RAR program limit each part to no bigger than 50 000 000 bytes. They also add a signature file (SFC) that contains a hash of each part file (a total of 32 files). This is the "scene" format for a DVD or BR disk distribution (15 000 000 bytes for a CD). The information for downloading this is distributed and the file is copied many times. Torrents are often used to communicate the download information in a single .torrent file.

For Rapidshare, 32 small files is inconvenient, so somebody makes a new RAR with 200MB RAR parts that contains the 31 RAR files and the SFC file. They also add some spam files inside the RAR. Modern RAR programs allow addition of validation and repair information, but most RS posters don't know how these work, so there are no hashes stored for these 8 files (one file is less than 200MB).

Now, somebody decides to post the files on ugotfile. Ugotfile is set up so that you can just move files over from hosts like RS without downloading them first. However, ugotfile limits each file to 100MB and automatically breaks the 200MB RS files into 100MB UG files (moviename.part1.rar.aa, moviename.part1.rar.ab, moviename.part2.rar.aa, etc.). The result is 15 files.

JD's post-processing goal is to obtain the original MKV file, sample file, and JPG files (and also the spam):
1) One or more JD threads download the 15 files in the download set (the threads all end). JD then starts a new thread to process the download set.
1a) At each step, JD has to keep a list of the files it has already processed to distinguish them from new files.
2) JD recognizes the .aa extension and joins all of the aa files to the corresponding ab file. This results in 8 .rar files (the RS files). Depending on the settings, JD might delete the aa and bb files or might leave them (but they are now on the ignore list).
3) JD Recognizes that it now has a part1.rar file and unrars it. Unrar does the equivalent of joining the 8 files together and then extracting the files contained in the archive.
3a) This results in a directory containing the spam files, plus the 31 rar files and the sfc file (and possibly the 8 rar files that are already processsed and are on the ignore list).
4) JD finds the original part1.rar file (from CrUd) and unrars it. The unrar process also uses the sfc file to validate the rar archives and the extracted files.
4.a) This results in the files that CrUd archived: the MKV file, the JPG snapshot files, the sample video file, and the NFO file.
5) There are no files that haven't been processed that match the post-processor's list of regular expressions, so it has no more to do with the files
6) JD now updates the database to list the package (or processing set) as complete and puts a green light in the UI. The postprocessing thread can end.

Ideally, the user starts with a list of 15 links on the screen and copies them to the clipboard. JD processes those links and the user presses the Continue button. At the end of the day, the user comes back and finds the movie ready to watch. In this ideal world, there is a window of finished downloads (looks like a web page or explorer or finder), and the user double clicks on one of the icons to watch the movie. In fact, ideally if there are no errors, the user doesn't have to press the Continue button.

The user doesn't have to know anything about the three merges and two extractions in the post-processing. The status should indicate something like "downloaded and being processed" (or "Processing" and a green circle to show it is downloaded).

#49 09.12.2009, 18:44

Quote:

Originally Posted by drbits

... In fact, ideally if there are no errors, the user doesn't have to press the Continue button.

I like that too. I would aim for this - in a future release.

And in a super-ideal-world - the user will use an RSS auto-downloader to "catch" favorites shows, and have them ready for him at home, w/o pressing any button... like in uTorrent, Vuze etc.

An improved linkgrabber for complete web pages (not loosing & using links position on web-page) aided by a good user filter - is a must for that to happen.

drbits · #50 09.12.2009, 19:29

I would also like a change so that continuing with a link does not change to Downloading.
It is especially irksome when I only approve one package instead of all.

#51 09.12.2009, 19:58

hmmm... it does NOT change to auto-downloading for me here (if there are no active downloads)...
But yes, if this thread topic will be implemented - all new tasks should have a configurable starting - state stoped/running

#52 10.12.2009, 15:40

Quote:

Originally Posted by drbits

It is impossible for a mathematical system that includes the equivalent arithmetic to be complete. This theorem is proven using the same approaches as Turing's work, but using set theory.

Turing proved that there is no way to know if there is a solution to some problems. The theorem above says that sometimes you cannot even state the question.

In Hofstadter's best seller "Gödel, Escher, Bach: An Eternal Golden Braid" I read about Kurt Gödel's incompleteness theorem. Gödel used his own numbering system to prove this theorem of theorems.

I don't know what this has to do with pi-calculus' use in BPM systems for instance.

I don't think that using data flow diagrams is such a good idea. Data flow diagrams are based on the knowledge of how a system is designed, i.e. decomposed into sub-systems or components. This is not a customer perspective, because the customer does not necessarily know or understand the technical details of a system (s)he is interacting with. This is the responsibility of the technical architect or designer.

I think a data perspective at the level of a customer is better presented with a class/object diagram. I'll tell more about it in another session.

Quote:

Originally Posted by drbits

If you haven't read Abbot's "Flatland", you should. Basically, it talks about dimensions and that you have to have enough dimensions to see an object, otherwise all you see is an outline of part of the object.

This is indeed an interesting book. SF author I.Asimov wrote an introduction for one of its editions. I'll read it.

Quote:

Originally Posted by drbits

1) Limit the complexity of any single diagram to around 7 (e.g., 7 states in a diagram). Possibly 3 to 9.

These rules mainly exist for presentation purposes. I think that people who already know the domain don't bother about the complexity of a diagram. How would an architect present a building to his/her customer when he/she can only draw 7 objects? This shows that familiarity with the subject matter is not an obstacle to convey complex drawings.

I think the download process is complex and can not easily be presented within the cognitive short term limit of 7 chunks. When you would level the diagram, then you lose the view on how the states are connected to each other. In my first diagram, the "in error" state was abstract. It didn't convey any information about how jD should present the different error states to the customer.

Quote:

Originally Posted by drbits

4) Errors are handled differently in different representations, but they always represent interactions.
...
So, the errors actually only require 2 states.
...
The user doesn't have to know anything about the three merges and two extractions in the post-processing. The status should indicate something like "downloaded and being processed" (or "Processing" and a green circle to show it is downloaded).

Since you like referring to books I'll follow that example.

In Winograd & Flores' ground breaking book "Understanding Computers and Cognition - A New Foundation for Design" some of the principles discussed are the anticipation of breakdown and the transparency of interaction between humans and computers.
After reading many months worth of posts in this forum I'm starting to know where things can go wrong in the downloading process. As long as jD supports extraction and joining of downloaded files and because these sub-processes can breakdown (look at the recent .xtm formatted files and at your detailed description of a complicated extraction/joining process), it should also integrate these notions in its interface/interactions with the customer.

I'm still not convinced of putting all post-processing in a black box.

drbits · #53 14.12.2009, 04:28

A dataflow diagram does not describe the data and actions in detail, unless it is being used in detailed design. At the user level, it has data like URLs, packages, passwords, download keys, etc. The actions are also abstracted.

A class diagram doesn't communicate much to the uneducated reader. There are too many different symbols (fewer in UML that Booche) and it only defines data. It is an excellent orthogonal view, but it does not specify anything (except data structures) on its own. UML has a diagram for the user view that is basically a data flow diagram in which the user is one of the nodes.

The reason for making most things black boxes at the top level is the functionality count. Each control that a user can touch in the program has to be represented somewhere. In truth, when you restrict the diagram to the download page, you are making it a black box in a higher level diagram.

One kind of postprocessing that is in JD and I just remembered is DLC expansion. When a DLC file (or equivalent) is downloaded, it is decrypted, and the list of links it contains is added to the download queue (the actual steps include LinkGrabber functionality that we don't want to discuss right now). There is even a useless dialog box that tells you it is doing this, but doesn't allow you to control it.

It doesn't have to be called postprocessing, it could be reassembly (or several other terms).

The output from postprocessing should also be abstracted: Processing error, Downloaded, or Downloaded and Checked are all that arre required for users.

:outch:

In my opinion, in addition to the written information in the status field, a link has one of the following as its status after it is transferred to the downloading page (ignoring LinkGrabber):
1) Queued - Waiting to be useful - No icon
2) Waiting - Waiting for a specific time or event - Blue pause Icon
3) Disabled - User disabled the link - Two vertical red lines (like pause) icon.
4) Connecting - Small blue play arrow icon
5) Connection_help - Connecting, but user help is needed (such as Captcha) - Small Orange play-arrow icon.
6) Connection_failed_retryable - 16 pixel blue circle with 12 pixel white circle inside.
6a) Downloading_failed_retryable (same icon as 6, must retry connection).
7) Connection_failed_Need_help - 16 pixel orange circle with 12 pixel white circle inside.
7a) Connection_failed_permanent - i14 pixel orange square (like stop on recorder). - Permanent error, unless user intervenes. This is like automatically disabled in current system.
8) Downloading - Green play arrow
9) Disconnected_Resumable - Blue circular arrow. - automatically resumable (as now).
10) Disconnected_Error - 14 pixel orange square (like stop on recorder). - Permanent error, unless user intervenes. This is like automatically disabled in current system. All downloaded data is deleted.
11) Downloaded_assembling - All non-disabled links have been downloaded, currently in post-processing. Icon is 16 pixel green circle with 12 pixel white circle inside.
12) Downloaded_finished - 16 pixel Green Check Mark, 3 Pixel width (either no assembly or assembly success).
13) Assembly_error - 16 pixel orange circle (downloaded) with 12 pixel blue circle inside (error need help)
14) File already exists - Blue glass, 2/3 full.
14b) Partial file already exists - Orange glass, 1/3 full.
14c) Blue glass half empty (we don't believe in that around here).

15) (question) Do we need an icon for completed download, but we have no CRC or other validation available? (blue checkmark)

I haven't had time to draw the icons yet. The three colors avoid the two most common types of colorblindness. The recorder/cd Play, Pause, Stop, and Check are self evident to most people. The donuts are less obvious and may need real icons, such as:
Assembly error: Tumbled blocks or broken wine glass.
Connecting: Blue telephone (old style).
Connection_failed_retryable (not resumable): Blue telephone in Orange box.
I think the green donut is OK.

Permanent failure is the same, regardless of when it happens. Orange stop symbol. No data is kept on disk for these. Reset will return to Queued. Should Enable?

Green means OK (proceeding normally or completed).
Orange means Needs User Help or Might not be able to complete.
Blue means OK, but not proceeding at the moment.

#54 16.12.2009, 14:10

@drbits:

I don't know what you mean with "download keys" nor with "The actions are also abstracted".

Here is my review of your proposition. My approach tries to keep the number of states to a minimum while explaining the details of a state in a status/error message. I think it is in the interest of the customer to keep the number of states as small as possible, but not too small. I want to keep it simple but not too simple, because jD is not a simple tool. I think we'll all agree on that.

1) As I said before I don't see what a very abstract state like "queued" brings to the customer. All items that are in a queue are "queued" by definition.

4) "Connecting" can be added, but I doubt many customers will ever see it. In my opinion it is a less important sub-state of downloading. When customer interaction is needed, i.e., when a fully automated download process isn't possible, the download unit goes into the "customer input" state with error messages like "firewall might be blocking jD", "password needed", "captcha needed", etc..

5) "Connection_help" is a sub-state of the "customer input" state. The fact that jD needs a captcha to be typed will be obvious when the customer is in front of his/her screen. When the customer is not available, the link might be retried automatically (temporary error with "captcha needed" as an explanatory status message.)

6) "Connection_failed_retryable" and "Downloading_failed_retryable" are sub-states of "temporary error" state.

7) "Connection_failed_Need_help" is a sub-state of "customer input" and "Connection_failed_permanent" is the same as or a sub-state of "permanent error".

9) "Disconnected_Resumable" is the same as "resumable".

10) According to your description of "Disconnected_Error", it is either a "permanent error" or a "customer input" state, depending on whether the download unit can not be recovered, whatever the means, or whether customer input is needed.

11) "Downloaded_assembling" is not a state but an ongoing activity. If you meant "Downloaded_assembl[ed]" then it would be the same as the "joined" state.

12) "Downloaded_finished" is either the "downloaded", the "extracted" or the "joined" state, depending on whether the download unit contains one file, one archive or several part (not to be taken literally, as there are several naming schemes) files respectively.

13) "Assembly_error" could be a sub-state of "customer input". For instance, a password might be needed.

14) "File already exists" is the same as "downloaded" but with a clarifying status message. What do you mean exactly with "Partial file already exists"? If the file is resumable, then it is in the "resumable" state, otherwise it is in the "customer input" state (Reset action) or "temporary error" state (Retry).

15) The fact whether a download unit can be validated or not is a property of the download unit. When validation is done and the result is OK then the state for the download unit will be "downloaded", otherwise the download unit will be in one of the "in error" states.

Why wouldn't jD keep data on download units that have a "permanent error"?
Reset returns a download unit to the "waiting" state, as a queued state doesn't make sense in a queue. "Enable" leads to the same "waiting" state.

That's my way of seeing the customer's "dashboard". Designers and developers can do what they want internally, but that's not the customer's business.

drbits · #55 17.12.2009, 06:34

I am flexible about all of this. I am just trying to help. So far, I have only dealt with Link status. How that is translated to packages is not in here.

"Download key" is the name some hosts use for the Download Password. I thought it would be better to keep the word Password for file decryption keys.

"Actions are abstracted" means that details are left out either to simplify the discussion or the presentation to the user. This is normal procedure in specification.

I would like to recommend the following terms (glossary up front!):
a) Reassembly (verb) (combines all of the processing to return part files to the files the user was trying to download).
b) An Assembly (noun) is what we have been calling a processing set.
c) Check (CRC, MDx, SHA-x, and other validation techniques). CRC is a specific calculation (usually CRC8, CRC16, or CRC32) that are used to validate strings of data (such as disk sectors) and often can be used to correct the data. MDx and SHA-x are cryptographic trapdoor hashes (should not be usable to correct the data). Other related validation include ECC, Graycoding and AES. Internally, ZIP files use CRC32, RAR files use AES.

1) Inactive is a better term. I was using Queued as the default state for a link. (or waiting its turn) is the default state on the download page. It just means that nothing is happening. Thus, I recommend no icon. Waiting is when this link has a specific amount of time to wait - a very different status.

4) Agreed, except that a connection can take several minutes if there is a problem (looping on Captcha, etc.) Example: Current problem with File-Rack (gets wrong Captcha, but never gives user a chance). Regardless of whether the user sees it, there is a state in which the the link is active, but the download has not started. We can just include it in Downloading if you prefer.

5) Sorry. You are correct. "Customer Input" (I prefer "User Input Needed") is what I was trying to say. This needs a different icon from "Temporarily unavailable". One fixes itself, while the other needs the user to click "resume" (or we can call it "retry"). This belongs to the set of Recoverable problems. "Temporarily unavailable" belongs to the set of automatically retryable problems. (not defining terms here, just making a distinction).

6) You are correct, "Customer input", "Connection error" and "Download error" are the best examples of Temporary Error. This is not to be confused with "Temporarilly unavailable" (which is not an error, but a type of wait). If your comment in (4) is followed and we do not display a "Connecting" state, then that gets rid of "Connection error".

All Temporary Errors need Resume permission from the user. In addition to Customer input, Temporary Errors would include server errors (500).

Where do "Download incomplete" and Check errors go? JD's current response to "Download incomplete" is somewhat host specific. If Resume is enabled. Download incomplete needs to automatically retry (retry failure would invariably lead to a connection or download error). If it is Resumable, it could just return to the Inactive state. If the team implements a retry limiter, that would help (#retries or time limit on a retry).

Download incomplete (not resumable) errors are probably "Temporary errors". They still need Resume permission.

Check errors at the link level could be retryable (if the team implements a retry limiter). A Check error at the Assembly level is more difficult (we don't know which part to retry). Where does this go?

7) I agree.

9) See "Download incomplete" in (6) of this post.

10) I agree..

11) I was just discussing specific links. I did not have time to get to "Assemblies" (which are what are joined, extracted, sometimes Checked and to which RAR passwords belong) or packages. Only an Assembly should go into Reassembly. A link is in Reassembly if it is part of an Assembly that is being Reassembled. The fact that other links in the Package are concurrently downloading does not affect this.

Although the two of us have agreed that "Processing sets" (Assemblies) exist and are currently identified in LinkGrabber (and maybe again later), I don't have a good way to mark them on the download page, other than assigning them a name. So a link could be in Assembly "Asian Wallpapers" with a part number of 3 within the Assembly.

If there is a Check error on download, that applies to the link. It is retryable, but will usually fail on retry (TCP eliminates almost all of these errors that are due to communication). We need the retry limiter here,

Check errors in Reassembly are more complicated. We currently don't know which part of the Assembly is bad (we don't get that information back from unrar or hjsplit in silent mode). If we redirect the output of these tools, JD could use a regex to extract the part number to retry.

12) I disagree strongly. The user should not have to know anything about joining vs extraction vs checking. The user normally sees a list of links for something to download (and might be given a password), but the steps of reassembly are not part of the user's input and not part of the knowledge of a noob to downloading.

The split archives are either to make retries easier (because FTP is generally not resumable), or because the host restricts the size of files.

Imagine that you are a normal Windows 6.0 (Vista) or 6.1 (Se7en) user. You might have seen Zip files, but unless you have entered the P2P or DDL worlds, you would not see RAR, R00, SIT, ARJ, A00, TAR, Z, GZ, TGZ, aa, ab, etc. files. These are just part of the file names and the better JD can hide the distinctions, the better it is for a normal user.

13) You are correct that a "Password problem" is a User Input situation and need not be distinguished to the user, just because it comes late in the process.

14) I agree, except for a couple of details. The multiple mirror processing in JD is a mess. If a file already exists and is complete, then it should be in the Downloaded state with a note that it already existed. If a file download is incomplete (a *.part file exists), it doesn't matter which link began the download.

The one trouble point (JD doesn't handle this in any reasonable way right now) is when two links use the same absolute file path, but are not the same file. For example, two apparent mirrors could have different sizes or Check data. This should be a permanent error "Overwrites file" with a recommended resolution (in the FAQ) of putting the file into a different package.

This isn't at user level: Right now, JD pre-checks for mirrors. This leads to bugs. If a link wants to download to a *.part that file is locked, then JD should put the link into "Temporarily unavailable". This requires opening the part file for Exclusive-Write before attempting connection. If the *.part file is writable, then JD can use the current link to continue downloading the file (the database will tell JD which ranges of bytes have been downloaded).

If a file exists (not a *.part file), then JD should not write to it, except to delete it due to a Reset or Delete command or as part of a retry. If the file length matches the correct file length in the database, then the link status should be Completed (already exists).

15a) "as a queued state doesn't make sense in a queue". This explains some of our communication confusion. All links that are not completed or disabled are in the queue. We distinguish other states, because this communicates something to the user. However, since every link must be in some state, those without a specific state are queued. I suggest calling it Inactive. We cannot call it the Waiting state, because that has a specific meaning on the hosts (and the user will see that if they connect their browser to a host).

15b) I did not mean to say that links with permanent errors should automatically be deleted. However, I thought only useful data should be kept in a file or *.part file. I now realize that I was wrong. Partially downloaded data should be left in the *.part file for debugging reasons and because the user might be able to use what is there (a partial archive might contain several useful files).
_______________________

I think we are very close to agreement. There are two places where I don't think we are on the same page:
12) The user does not need to know anything about reassembly, and
15a) There is a difference between waiting because the server said to wait and waiting because it is not time to process that link (Inactive).

I have recommended some language changes to simplify and generalize things, but they are not really important.

So, from my bloated set of states we have a much smaller collection of Link Status. The bloat was partly my failure in distancing myself from the implementation considerations during specification.

1) Inactive (or Not_started or Queued) - Waiting to be used - No icon. Not a term used with the user.
2) Waiting - Waiting for a specific time or event based on feedback from the server - Icon could be a blue clock face.
3) Disabled - User disabled the link - The icon could be n orange octagon (like the traffic stop sign). Or it could be some kind of strike through or background change (as it is now). However, any text should still be readable.
4) User_help_needed (Consumer_Help) - Anytime a password, Captcha, or other information is required to proceed. Icon could be an orange hand with the index finger pointing.
5) Temporarily_problem (Temporary_error) - Any problem that can automatically be retried. Icon could be a blue pause symbol. I like Problem better than Error, because it isn't permanent.
6) Permanent_error - Icon could be a 14 pixel orange square (like stop on recorder). - Some user intervention might restart this, but this is like the current automatically disabled state. Enable, Resume (or Retry), or Reset will return this to Inactive.
7) Downloading - Icon could be a green play arrow
8) Reassembling - All non-disabled links for an Assembly have been downloaded and the link is currently in post-processing. Icon could be a green wrench.
9) Complete - 16 pixel Green Check Mark, 3 Pixel width (either no Reassembly or Reassembly success).
9a) Complete (file already exists). Same Icon, just has a message.
9b) Complete (no way to verify). Same Icon, just has a message.
?-10) Paused: The link has been manually paused. This is equivalent to Disabled (it aborts any ongoing transfer), but will return to Inactive when the program is restarted. -- Or is this not what this state was intended for? Icon could be a red pause symbol.
?-11) Graveyard - The link has been deleted. This keeps the link available to avoid re-downloading the file. It also allows for undelete. Not normally displayed. Need way to display (could be with categories) and to purge. Icon could be a gray tombstone (Graveyard predates the Mac Wastebasket).

What have I left out? What have I messed up?

The attachment is just a "Top-level-diagram". The idea is to model how the user sees JDownloader. There are no details, just the big concepts. Each line, circle, or square can be defined in more detail in a different diagram. Again, I am open for correction.

#56 19.12.2009, 14:16

Thanks for all the comments, because it allows me to better explain what I meant with the state diagram. I'll also change some smaller things to my state diagram.

My abstract "in error" state is meant to be a pure partition, i.e., a) the 4 states in it do not overlap and b) all error states are in one of those 4 states. This means that "customer input (needed)" and "temporary error" need to be completely distinct. Reading your answers I now suggest to call the latter state the "retry-able" state. It would be named after its only outgoing transition, "Retry". I'll adopt that term.

Quote:

Where do "Download incomplete" and Check errors go?...A Check error at the Assembly level is more difficult (we don't know which part to retry). Where does this go?

A download that is not complete will be in the :-
- "resumable" state whan it can be resumed,
- "retry-able" state (former "temporary error" state) when it is in its automatic retry phase
- "customer input (needed)" state when it is no longer in it its automatic retry phase
- "jD bug" state when it is caused by a host plug-in or other bug
- "permanent error" state when all attempts to complete the download have failed.

A download that has a check error will be in the :-
- "(auto) retry-able" state (former "temporary error" state) when it is in its automatic retry phase; note that the number of retries should be a customer option, because it can cost considerable download traffic if the error is at the host server's side. I would also exclude automatic retries at the download set (assembly) level.
- "customer input (needed)" state when it is no longer in its automatic retry phase or when the customer has opted to deal with these errors in a manual way. Here check errors at the download set level will be presented to the customer.
- "jD bug" state when it is caused by a bug (I don't have examples for this

)
- "permanent error" state when all attempts to complete the download without errors have failed.

Thanks to your comments I detected an error concerning the Resume transition. It should of course go to the "waiting" (or your "inactive") state.

IMO the distinction between "waiting" and "inactive" is not important for the customer. The only distinction is that there is a counter for your "waiting" state, while for me the counter is just an indicator that the download will start soon, if started at all.
If the scheduler would be able to schedule a download unit, a counter could be put on that download unit as well, even when it has to start within 7 hours. Other "inactive" downloads might precede your "waiting" downloads. An inactive download unit can attain downloading status without any waiting at all (from the customer's perspective).
Additionally, a counter that reaches 0 is not a guarantee that a download will start. I sometimes have 10 links with counters but sometime none of them starts downloading and the counting simply restarts. In one word, the distinction is superfluous.

The automatic retry feature based on a number or time period is a good idea I've been promoting a long time.

Quote:

Check errors in Reassembly are more complicated. We currently don't know which part of the Assembly is bad (we don't get that information back from unrar or hjsplit in silent mode). If we redirect the output of these tools, JD could use a regex to extract the part number to retry.

These are technical limitations of the current implementation with add-ons. Their APIs might be too limited.
Isn't this something we can ask Boris Brodsky from 7-Zip? If their software is well integrated with jD, we might have more detailed information about what can go wrong in an extraction process.

12) If the customer would only see a download set and not the individual links, then you're 100% correct. I would like to be in that 'metalink' era too. The problem is that we aren't there yet. As I said before, many things can go wrong by manipulating individual links that belong to a download set.
Here is the main reason why I defend the distinction between extracted and joined. The customer usually sees and copies several links belonging to a download set. Sometimes a link of the download set is no longer available. Let's assume there are no mirrors. The customer understands that when one file is unavailable, it does no longer make sense to download the other links.
Currently, jD doesn't warn the customer when download set isn't complete and that's because jD does not support the concept of download set. It doesn't have enough knowledge to do that. It only groups links based on their name. jD has to make this first step before we can start thinking of a black box assembly process. The same applies to mirror links.
Let's assume jD knows about download sets, mirrors, etc., and the customer does no longer see the individual links, then (s)he should still be able to disable the joining or even extraction of files. The reason might be that the joining format is not supported or that the customer wants to re-upload the part files to another host or do whatever manipulation with them. Some archiving formats allow partial extraction or even partial download within certain limits.
My main argument still is the fact that the assembly process can break and the knowledgeable customer should be able to find a workaround for the problem or implement her/his own process to do the assembly or whatever manipulation. Look at your point 15b). You give another example.

14) The situation that "two links use the same absolute file path, but are not the same file" is a "customer input (needed)" error, except if the customer has selected 'overwrite' as a global option. I doubt whether this particular setting is useful as a global setting.

15) OK, you link the term "waiting" to the hosts. I don't see these hosts when I use jD.

jD applies its own wait times. They are related to the host's wait times, but there is no guarantee that they're equal. It's perfectly possible that if you would attempt to download a "waiting" link with your browser it starts immediately. That's why for RS, for instance, jD will check jD's waiting time every 5 minutes. That's something I read in the forum yesterday.

I don't think we need to agree on everything. Only the future will tell which tools implement the proper states and acquire the most customers. It's by communicating and comparing our differences that useful information can be obtained. Agreeing on almost everything doesn't mean that jD will evolve in that direction neither.

---

Quote:

1) Inactive (or Not_started or Queued) - Waiting to be used - No icon. Not a term used with the user.

You define "inactive" as "waiting to be used".

To me it's a technical difference that can be indicated as a counter. You acknowledge that "inactive" is a term not be used by the customer. If jD would indeed use such a term the customer could become confused. (S)he will try to find an "Activate" command. The term "inactive" implies that if a download unit isn't "inactive", it is active. I wouldn't call a download unit in the "downloaded" or the "permanent error" state "active". My advise still is to keep it simple.

5) A problem is an issue or obstacle which makes it difficult to achieve a desired goal, objective or purpose (source: wikipedia). A problem can be solved, but not always.
An "error" is a deviation from accuracy or correctness (source: wikipedia). An error can be corrected, but not always.

I think the state that is most closely related to the concept of problem solving is the "jD bug" state. We could call it a "jD problem" instead. What do you think? BTW, you forgot to mention that error state in your list.

6) A download unit in the "permanent error" state cannot be corrected by jD, the customer nor the jD developers. That's by definition. The download unit is ready to be removed by the customer.

8) "Reassembling" is not a state, it is an ongoing activity. As you know, I have two possibilities ("extracted" and "joined") here.

9) The "Complete" state is either "downloaded", "extracted", "joined" or "removed", independently whether it concerns a duplicate or whether it has been checked or not.

10) The "paused" state is used in jD to keep the connections of all currently downloading files alive. There is no reason why jD wouldn't be extended to pause packages, download sets or links. A paused download unit can only be "Continued".

11) The 4 final states in point 9) constitute your "graveyard". I'll add an "Un-delete" command. This means that when links are Dequeued/Deleted, they are in the "deleted" state. An Undo function could bring them back. When the jD session terminates, the "deleted" items will be lost. The undo logic also applies to the Enqueue command and probably to other states as well, but this will lead us to a much more complex state diagram.

BTW, your diagram is too small for me. I can't read the text or does it contain a secret message?

#57 04.04.2011, 14:08

Quote:

Originally Posted by danutz

Restarting is not in itself the problem. Anything that can't be automated is a problem -- having to press the "update" button, having to reset a link etc. Fatal errors kill automation. The only solution is to make plugin out-of-date errors non-fatal.

In my opinion the entire approach for handling "unexpected responses" is weak. You can't hope to keep up with every possible variation of error page that hosters throw at you. It is a losing battle. One should only strive to handle "normal" workflows.

I would propose the following logic for JD:

1. replace "Plugin out-of-date" errors with "unexpected response" errors; make these non-fatal: retry them 5 times in a row, then once every 60 minutes only.

2. monitor the available JD updates constantly

3. whenever JD sees an update for hoster H, it should disable all links from H that are waiting to be retried due to "unexpected responses" (as per rule 1)

4. whenever no downloads are active AND updates are queued up, JD should restart itself, with no intervention required.

5. after the restart, re-enable the links that were disabled by rule (3) above

Rule (1) balances the chance that an "unexpected response" might just be a transient incident, with the danger of hammering a hoster that really has changed its interface.

Rules (2) and (3) ensures that un-upgraded JD's don't continually retry links on hosters that are known to have changed.

Rules (4) and (5) ensure that JD can be trusted do its job even if the owner goes away for a 2-week trip.

You're right that the exception handling of jD is weak and I think that's the main reason why jD is being redeveloped.

The non-fatal errors are all errors that aren't permanent ("perman.error" in my diagram).

"jD bug" is the "in error" state for links for which the host plug-in is outdated. No disabling of links is necessary as this should be a state set by the customer.

I don't think updates of other functions than plug-ins should be totally automated. This should be decided by the customer.

#58 04.04.2011, 14:39

Quote:

Originally Posted by remi

I don't think updates of other functions than plug-ins should be totally automated. This should be decided by the customer.

That's fine -- add an option for it. I would guess most people would want JD to auto-update and restart overnight, and those who don't should be able to opt out.

The important thing for me is that JD be capable of running unattended for as long as possible, *if* the owner so desires.

#59 04.04.2011, 16:05

Quote:

Originally Posted by danutz

That's fine -- add an option for it. I would guess most people would want JD to auto-update and restart overnight, and those who don't should be able to opt out.

The important thing for me is that JD be capable of running unattended for as long as possible, *if* the owner so desires.

+1

View Poll Results: Do you like to have control over individual downloads ? (& your GUI prefereces)
Yes, I need such control over my downloads	16	72.73%
No, the current global control for ALL downloads is enough	2	9.09%
Check here if you like to have categories' selection (on the left) for filtering the main view	8	36.36%
Check here if you like to have bottom tabs to view details, log, speed etc.	7	31.82%
Check here if you like the current JD user interface with no change	3	13.64%
Multiple Choice Poll. Voters: 22. You may not vote on this poll

Thread Tools
Show Printable Version Email this Page
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

	JDownloader Community Board - Archive - Top
Provided By AppWork GmbH \| Privacy \| Imprint