The Advanced tab contains some settings for advanced users.
The first three settings are related to post-processing, meaning that they're used after the session finishes. The first of the three, Compress to Zip Archive, is used to instruct SiteCrawler to create an archive of the downloaded files. This archive is placed in the download folder.
Run AppleScript lets you choose an AppleScript script that is run after the session is finished. The AppleScript must have a handler called session_finished
that accepts one parameter. The parameter contains a list of aliases of the folders downloaded.
This AppleScript moves the downloaded domain folders into the folder "Stuff" on the desktop:
on session_finished(rootFolderList)
repeat with thisAlias in rootFolderList
tell application "Finder" to move thisAlias to folder "Stuff" of desktop
end repeat
end session_finished
The next setting, Run Shell Command, is similar to the AppleScript feature, but it runs a shell command instead of an AppleScript. You can run standard Unix command-line tools or your own scripts. The paths to all downloaded folders are contained in the SITECRAWLER_ROOT
environment variable, separated by colon characters.
Below the post-processing settings, you'll find a User-Agent field. This is the identification text a web browser sends to the web server for every request. Sometimes, a web server denies some web browsers content because of incompatibility problems. Use this field to trick the web server into thinking you're a real web browser, if you need to.
With HTTP Headers, you can insert your own header fields into all HTTP requests. Click the +
button to add an entry to the list. Double-click the line to fill it in. If you filled in X-Field
as the Field Name and Foo
in the Value section, the snippet being added to the request would look like X-Field: Foo
. If these header fields conflict with the standard request fields sent by SiteCrawler, the custom header fields will be given priority.
Lastly, there's the Fill in Referer setting. If this is enabled, requests will contain a Referer
field to tell the server which page it followed the link from. The word referer is misspelled by us on purpose - this is the way it is named according to the HTTP specification.