NDLI: Choosing an NLP library for analyzing software documentation: a systematic literature review and a series of experiments

Please wait, while we are loading the content...

Half-century of unix: history, preservation, and lessons learned

An empirical study on Android-related vulnerabilities

Analyzing program dependencies in Java EE applications

Predicting likelihood of requirement implementation within the planned iteration: an empirical study at IBM

Choosing an NLP library for analyzing software documentation: a systematic literature review and a series of experiments

Mining change histories for unknown systematic edits

Do not trust build results at face value: an empirical study of 30 million CPAN builds

An exploratory study on assessing the impact of environment variations on the results of load tests

TravisTorrent: synthesizing Travis CI and GitHub for full-stack research on continuous integration

Software evolution and quality data from controlled, multiple, industrial case studies

Understanding the origins of mobile app vulnerabilities: a large-scale measurement study of free and paid apps

Mining social web service repositories for social relationships to aid service discovery

The impact of using regression models to build defect classifiers

Bootstrapping a lexicon for emotional arousal in software engineering

Source file set search for clone-and-own reuse analysis

An empirical analysis of the docker container ecosystem on GitHub

A large-scale study on the usage of testing patterns that address maintainability attributes: patterns for ease of modification, diagnoses, and comprehension

On the differences between unit and integration testing in the travistorrent dataset

A dataset of scratch programs: scraped, shaped and scored

Developer mistakes in writing Android manifests: an empirical study of configuration errors

Who you gonna call?: analyzing web requests in Android applications

A large-scale study of the impact of feature selection techniques on defect classification models

Leveraging automated sentiment analysis in software engineering

RefDiff: detecting refactorings in version histories

How open source projects use static code analysis tools in continuous integration pipelines

To mock or not to mock?: an empirical study on mocking practices

Cost-effective build outcome prediction using cascaded classifiers

Continuous defect prediction: the idea and a related dataset

How do apps evolve in their permission requests?: a preliminary study

Extracting code segments and their descriptions from research articles

SpreadCluster: recovering versioned spreadsheets through similarity-based clustering

Predicting usefulness of code review comments using textual features and developer experience

Stack overflow in github: any snippets there?

An empirical analysis of build failures in the continuous integration workflows of Java-based open-source software

Bug characteristics in blockchain systems: a large-scale empirical study

Sentiment analysis of Travis CI builds

An extensive dataset of UML models in GitHub

A study on the energy consumption of Android app development approaches

Structure and evolution of package dependency networks

Who will leave the company?: a large-scale industry study of developer turnover by mining monthly work report

Classifying code comments in Java open-source software systems

Some from here, some from there: cross-project code reuse in GitHub

Oops, my tests broke the build: an explorative analysis of Travis CI with GitHub

Euphony: harmonious unification of cacophonous anti-virus vendor labels for Android malware

A time series analysis of TravisTorrent builds: to everything there is a season

A dataset for dynamic discovery of semantic changes in version controlled software histories

Candoia: a platform for building and sharing mining software repositories tools as apps

Spencer: interactive heap analysis for the masses

Concept-based classification of software defect reports

Using Q&A websites as a method for assessing systematic reviews

Exception evolution in long-lived Java systems

Extracting build changes with BuildDiff

Rationale in development chat messages: an exploratory study

Insights into continuous integration build failures

Rediscovery datasets: connecting duplicate reports

Abnormal working hours: effect of rapid releases and implications to work content

An empirical study of the personnel overhead of continuous integration

A data set of OCL expressions on GitHub

How does contributors' involvement influence the build status of an open-source software project?

On the interplay between non-functional requirements and builds on continuous integration

Analyzing the impact of social attributes on commit integration success

Built to last or built too fast?: evaluating prediction models for build times

The impact of the adoption of continuous integration on developer attraction and retention

An empirical study of activity, popularity, size, testing, and stability in continuous integration

Impact of continuous integration on code reviews

Prevalence of botched code integrations

Choosing an NLP library for analyzing software documentation: a systematic literature review and a series of experiments

Content Provider	ACM Digital Library
Author	Treude, Christoph Omran, Fouad Nasser A Al
Abstract	To uncover interesting and actionable information from natural language documents authored by software developers, many researchers rely on "out-of-the-box" NLP libraries. However, software artifacts written in natural language are different from other textual documents due to the technical language used. In this paper, we first analyze the state of the art through a systematic literature review in which we find that only a small minority of papers justify their choice of an NLP library. We then report on a series of experiments in which we applied four state-of-the-art NLP libraries to publicly available software artifacts from three different sources. Our results show low agreement between different libraries (only between 60% and 71% of tokens were assigned the same part-of-speech tag by all four libraries) as well as differences in accuracy depending on source: For example, spaCy achieved the best accuracy on Stack Overflow data with nearly 90% of tokens tagged correctly, while it was clearly outperformed by Google's SyntaxNet when parsing GitHub ReadMe files. Our work implies that researchers should make an informed decision about the particular NLP library they choose and that customizations to libraries might be necessary to achieve good results when analyzing software artifacts written in natural language.
Starting Page	187
Ending Page	197
Page Count	11
File Format	PDF
ISBN	9781538615447
DOI	10.1109/MSR.2017.42
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2017-05-20
Access Restriction	Subscribed
Subject Keyword	Nlp libraries Software documentation Part-of-speech tagging Natural language processing
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in