Kolovos, Dimitris orcid.org/0000-0002-1724-6563, Neubauer, Patrick orcid.org/0000-0002-9811-4772, Barmpis, Konstantinos et al. (2 more authors) (2019) Crossflow:A framework for distributed mining of software repositories. In: Proceedings - 2019 IEEE/ACM 16th International Conference on Mining Software Repositories, MSR 2019. 16th IEEE/ACM International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019 IEEE International Working Conference on Mining Software Repositories . IEEE Computer Society , CAN , pp. 155-159.
Abstract
Large-scale software repository mining typically requires substantial storage and computational resources, and often involves a large number of calls to (rate-limited) APIs such as those of GitHub and StackOverflow. This creates a growing need for distributed execution of repository mining programs to which remote collaborators can contribute computational and storage resources, as well as API quotas (ideally without sharing API access tokens or credentials). In this paper we introduce Crossflow, a novel framework for building distributed repository mining programs. We demonstrate how Crossflow can delegate mining jobs to remote workers and cache their results, and how workers can implement advanced behaviour such as load balancing and rejecting jobs they cannot perform (e.g. due to lack of space, credentials for a specific API).
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Keywords: | Client-server systems,Computer aided software engineering,Data analysis,Data collection,Data flow computing,Data integration,Distributed processing,Modeling,Open source software,Pipeline pro cessing,Public domain software,Scalability,Software engineering |
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Pure (York) |
Date Deposited: | 25 Aug 2020 13:30 |
Last Modified: | 25 Jan 2025 00:03 |
Published Version: | https://doi.org/10.1109/MSR.2019.00032 |
Status: | Published |
Publisher: | IEEE Computer Society |
Series Name: | IEEE International Working Conference on Mining Software Repositories |
Identification Number: | 10.1109/MSR.2019.00032 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:164779 |