View Single Post
  #55  
Old 02.07.2020, 10:50
Amiganer Amiganer is offline
DSL User
 
Join Date: Mar 2019
Posts: 38
Default Mein Linksammler script

Ich verwende Python 3.8:

Code:
def extract_links(frompathfilename, savepathfilename, pattern=PATTERN, pretext=PRETEXT):
    with open(frompathfilename, "rt", encoding="utf-8") as readfp,\
         open(savepathfilename, "wt", encoding="utf-8") as writefp:
        
        if not(pretext is None):
            writefp.write(pretext)
        
        for line in readfp:
            for muster in pattern:
                ergebnis = muster.search(line)
                while ergebnis:
                    string = ergebnis.groupdict()["link"]+" "
                    writefp.write(string)
                    print(string)
                    line = muster.sub("", line, count=1)
                    ergebnis = muster.search(line)
PATTERN: ist eine Liste von compilierten regex Beispiel:
Code:
PATTERN=[ re.compile('"**External links are only visible to Support Staff**', re.I),
re.compile('"**External links are only visible to Support Staff**', re.I ]
PRETEXT: ist das Pattern für den crawljob, Beispiel:
Code:
PRETEXT = "\n".join([
    "chunks=0",
    "autoConfirm=true",
    "autoStart=true",
    "deepAnalyseEnabled=true",
    "enabled=true",
    "extractAfterDownload=false",
    "forcedStart=true",
    "#priority=HIGHEST",
    "#",
    "# Links",
    "#",
    "packageName=Downloads",
]) + "\ntext="
Bye, Christian
Reply With Quote