Drexel University Home Pagewww.drexel.edu DREXEL UNIVERSITY LIBRARIES HOMEPAGE >>

iDEA: Drexel E-repository and Archives > Drexel Academic Community > College of Engineering > Department of Computer Science > Faculty Research and Publications (Comp Sci) > A probabilistic approach to source code authorship identification

Please use this identifier to cite or link to this item: http://hdl.handle.net/1860/2575

Title: A probabilistic approach to source code authorship identification
Authors: Kothari, Jay
Shevertalov, Maxim
Stehle, Edward
Mancoridis, Spiros
Issue Date: 2-Apr-2007
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Citation: Paper presented at 4th International Conference on Information Technology-New Generations, ITNG 2007, Las Vegas, Nevada.
Abstract: There exists a need for tools to help identify the authorship of source code. This includes situations in which the ownership of code is questionable, such as in plagiarism or intellectual property infringement disputes. Authorship identification can also be used to assist in the apprehension of the creators of malware. In this paper we present an approach to identifying the authors of source code. We begin by computing a set of metrics to build profiles for a population of known authors using code samples that are verified to be authentic. We then compute metrics on unidentified source code to determine the closest matching profile. We demonstrate our approach on a case study that involves two kinds of software: one based on open source developers working on various projects, and another based on students working on assignments with the same requirements. In our case study we are able to determine authorship with greater than 70% accuracy in choosing the single nearest match and greater than 90% accuracy in choosing the top three ordered nearest matches.
URI: http://dx.doi.org/10.1109/ITNG.2007.17
Appears in Collections:Faculty Research and Publications (Comp Sci)

Files in This Item:

File Description SizeFormat
2007005106.pdf297.91 kBAdobe PDFView/Open
View Statistics

Items in iDEA are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0! iDEA Software Copyright © 2002-2010  Duraspace - Feedback