Wednesday, July 3, 2013

Downloading source code from SVN/Git repository over HTTP

    Sites hosting open source projects provides an online viewer for browsing source code without actually checking out source code using SVN/Git clients. Checking out entire repository will take long time. Also with Git there is no straight forward way to checkout only a particular directory. Cloning Git repos takes long time as Git downloads entire repository to local machine. Even with sparse checkout, Git downloads entire repository. When bandwidth is a concern, one cannot checkout entire repository.

    One can use GNU Wget to recursively download files from online code repositories. For windows this can be downloaded from

    Command to download a directory and its child directories and all files in it recursively excluding index.html is below. This will not download parent directories and files from external sites.

wget --cut-dirs=2 --level=15 --include-directories=src/main/java --recursive --no-parent --no-host-directories --reject=index.html -e robots=off --no-clobber

    Be careful with slashes. When I used backslash, it did not work.

--cut-dirs=N This ignores directories of N levels from root directory of the URL
--level=N Downloads files from N level of directories. Default is 5 levels
--include-directories=src/main/java Include only this directory and its child directories
--recursive Recursively download
--no-parent Do not go to parent directory of the given URL.
--no-host-directories Without this option wget creates a directory by the host name of the server
--reject=index.html Do not create index.html file
-e robots=off Exclude robots.txt when crawling the site
--no-clobber Do not overwrite existing files

    Here are few of the source code repository addresses.

    For Google Code sites use for Git repo or if it is a svn repo.

No comments:

Post a Comment