|
iDEA: Drexel E-repository and Archives >
Drexel Academic Community >
College of Engineering >
Department of Computer Science >
Faculty Research and Publications (Comp Sci) >
A probabilistic approach to source code authorship identification
Please use this identifier to cite or link to this item:
http://hdl.handle.net/1860/2575
|
| Title: | A probabilistic approach to source code authorship identification |
| Authors: | Kothari, Jay Shevertalov, Maxim Stehle, Edward Mancoridis, Spiros |
| Issue Date: | 2-Apr-2007 |
| Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
| Citation: | Paper presented at 4th International Conference on Information Technology-New Generations, ITNG 2007, Las Vegas, Nevada. |
| Abstract: | There exists a need for tools to help identify the authorship
of source code. This includes situations in which the
ownership of code is questionable, such as in plagiarism
or intellectual property infringement disputes. Authorship
identification can also be used to assist in the apprehension
of the creators of malware. In this paper we
present an approach to identifying the authors of source
code. We begin by computing a set of metrics to build profiles
for a population of known authors using code samples
that are verified to be authentic. We then compute
metrics on unidentified source code to determine the closest
matching profile. We demonstrate our approach on a
case study that involves two kinds of software: one based
on open source developers working on various projects,
and another based on students working on assignments
with the same requirements. In our case study we are able
to determine authorship with greater than 70% accuracy
in choosing the single nearest match and greater than
90% accuracy in choosing the top three ordered nearest
matches. |
| URI: | http://dx.doi.org/10.1109/ITNG.2007.17 http://hdl.handle.net/1860/2575 |
| Appears in Collections: | Faculty Research and Publications (Comp Sci)
|
Items in iDEA are protected by copyright, with all rights reserved, unless otherwise indicated.
|