|
|
# GitHub Crawler
|
|
|
# About
|
|
|
|
|
|
GitHub Crawler is an application that has been developed within the context of the SDK4ED project, as part of the Dependability Toolbox, in order to enable the automatic collection and static analysis of a large number of open-source software repositories from GitHub.
|
|
|
|
... | ... | @@ -6,19 +6,21 @@ The GitHub Crawler is responsible for (i) downloading a large number of open-sou |
|
|
|
|
|
The GitHub Crawler was utilized by the SDK4ED project for (i) constructing the Benchmark Repository that was utilized for calibrating the Software Security Assessment Model (SAM) (link) and (ii) for constructing software metrics-based Vulnerability Prediction Models (VPMs) (link) that are part of the Dependability Toolbox (link). It is also utilized by the Dependability Toolbox for constructing security benchmarks.
|
|
|
|
|
|
## Summary/Overview
|
|
|
## Overview
|
|
|
|
|
|
A high-level overview of the GitHub Crawler application is depicted in the figure below:
|
|
|
|
|
|
![githubcrawler](uploads/639e312cfd344437bda3feff5d7deb34/GitHubCrawler-High-Level-Overview.png)
|
|
|
|
|
|
As can be seen by the figure above, initially, the user has to provide a set of input parameters, which are necessary for configuring the tool and for constructing the search query that will be utilized for crawling the GitHub repository. The GitHub Dowloader component receives these parameters by the user, constructs the query, and submits it to the GitHub API. The query contains (at least) the following parameters: (i) the programming language of the desired software repositories, (ii) the way that the retrieved repositories should be sorted (e.g., by stars, by forks, etc.), and (iii) specific textual information for narrowing down the search (optional). The software repositories that satisfy the query are downloaded locally. Then, the Compiler component is responsible for compiling the downloaded repositories. Finally, those repositories that successfully compiled are passed through the Software Analysis component, which executes the CKJM Extended and CCCC static code analyzers, in order to compute popular software metrics, including complexity, coupling, and cohesion metrics. The results are generated in a report.
|
|
|
<span dir="">As can be seen by the figure above, initially, the user has to provide a set of input parameters, which are necessary for configuring the tool and for constructing the search query that will be utilized for crawling the GitHub repository. The GitHub Downloader component receives these parameters from the user, constructs the query, and submits it to the GitHub API. The query contains (at least) the following parameters: (i) the programming language of the desired software repositories, (ii) the way that the retrieved repositories should be sorted (e.g., by stars, by forks, etc.), and (iii) specific textual information for narrowing down the search (optional). The software repositories that satisfy the query are downloaded locally. Then, the Compiler component is responsible for compiling the downloaded repositories. Finally, those repositories that are successfully compiled are passed through the Software Analysis component, which executes the CKJM Extended and CCCC static code analyzers,</span> <span dir="">in order to compute popular software metrics, including complexity, coupling, and cohesion metrics. The results are generated in a report.</span>
|
|
|
|
|
|
It should be noted that the steps are sequential and optional. The user, at the start of the analysis, could determine what to be executed through properly setting a specific parameter. The user could select to:
|
|
|
<span dir="">It should be noted that the steps are sequential and optional. The user, at the start of the analysis, could determine what to be executed by properly setting a specific parameter. The user could select to:</span>
|
|
|
|
|
|
Simply download software repositories that satisfy their query (skiping the compilation and analysis steps) Download software repositories and compile them (skipping the analysis step) Download, compile, and analyze the software repositories (i.e., perform a complete analysis)
|
|
|
* <span dir="">Simply download software repositories that satisfy their query (skipping the compilation and analysis steps) </span>
|
|
|
* <span dir="">Download software repositories and compile them (skipping the analysis step) </span>
|
|
|
* <span dir="">Download, compile, and analyze the software repositories (i.e., perform a complete analysis)</span>
|
|
|
|
|
|
## Usage of the GitHub Crawler
|
|
|
## Utilization of the GitHub Crawler
|
|
|
|
|
|
The GitHub Crawler can be used (indirectly) through the API that is provided by the Dependability Toolbox. For more information on how to use the Dependability Toolbox, please check its dedicated wiki page (link).
|
|
|
|
... | ... | |