Wednesday, July 3, 2013

Downloading source code from SVN/Git repository over HTTP

    Sites hosting open source projects provides an online viewer for browsing source code without actually checking out source code using SVN/Git clients. Checking out entire repository will take long time. Also with Git there is no straight forward way to checkout only a particular directory. Cloning Git repos takes long time as Git downloads entire repository to local machine. Even with sparse checkout, Git downloads entire repository. When bandwidth is a concern, one cannot checkout entire repository.

    One can use GNU Wget to recursively download files from online code repositories. For windows this can be downloaded from http://users.ugent.be/~bpuype/wget/

    Command to download a directory and its child directories and all files in it recursively excluding index.html is below. This will not download parent directories and files from external sites.

wget --cut-dirs=2 --level=15 --include-directories=src/main/java --recursive --no-parent --no-host-directories --reject=index.html -e robots=off --no-clobber  http://sourcesite.com/src/main/java

    Be careful with slashes. When I used backslash, it did not work.

--cut-dirs=N This ignores directories of N levels from root directory of the URL
--level=N Downloads files from N level of directories. Default is 5 levels
--include-directories=src/main/java Include only this directory and its child directories
--recursive Recursively download
--no-parent Do not go to parent directory of the given URL.
--no-host-directories Without this option wget creates a directory by the host name of the server
--reject=index.html Do not create index.html file
-e robots=off Exclude robots.txt when crawling the site
--no-clobber Do not overwrite existing files


    Here are few of the source code repository addresses.

http://svn.apache.org/repos/asf/
http://selenium.googlecode.com/git

    For Google Code sites use projectname.googlecode.com/git for Git repo or projectname.googlecode.com/svn if it is a svn repo.


No comments:

Post a Comment