#1
|
|||
|
|||
![]()
Hi, there is a growing project to digitize the collections of many libraries and make them available as PDF and DJVU files online. Many libraries have joined the project by now. They all work on the same framework, named DLIBRA, and they list the newspapers' structure by each year, month, day if it's a daily, etc. The newspapers can be read online or downloaded, for free of course.
The one problem is for example if you want to download, let's say, a full month of a daily newspaper's issues - you need to go to the newspaper's page, open the year, the month, then open every daily issue and click on PDF or DJVU version to start the download. DLIBRA doesn't seem to hide the links, so I think that the entire newspaper's structure can be crawled easily. Here is an example from one library: **External links are only visible to Support Staff**A newspaper's structure. **External links are only visible to Support Staff**An example issue with options to download (left interface). Another type of library: **External links are only visible to Support Staff**A newspaper's structure. **External links are only visible to Support Staff**An example issue with options to download. |
#2
|
||||
|
||||
![]()
Hi,
at this moment I do not see the broad demand here. Also the website structure looks to be quite simple so you should be able to automate some crawling using LinkCrawler rules. Important notice: Our ticket system & knowledgebase are currently under maintenance. If this is still the case while you are reading my posts and you can't access help articles linked by our staff, use the Internet archive/wayback machine to be able to view those articles: archive.org/web/ Enter the URL -> Click on "Browse history" -> Select one of the latest dates available If the date you selected does not lead you to the support article try the next oldest one.
__________________
JD Supporter, Plugin Dev. & Community Manager
Erste Schritte & Tutorials || JDownloader 2 Setup Download |
#3
|
||||
|
||||
![]()
@Hazzard: took a quick peak and should be possible with Linkcrawler rules as explained by pspzocker scene. One rule that processes the index (with many items on it) site and another rule for the actual download. In case you need help/got questions, please just ask
__________________
JD-Dev & Server-Admin |
![]() |
Thread Tools | |
Display Modes | |
|
|