NDLI: Cross-supervised synthesis of web-crawlers

Please wait, while we are loading the content...

PRADA: prioritizing android devices for apps by mining large-scale usage data

Generating performance distributions via probabilistic symbolic execution

The emerging role of data scientists on software development teams

On the techniques we create, the tools we build, and their misalignments: a study of KLEE

An empirical comparison of compiler testing techniques

Energy profiles of Java collections classes

Overcoming open source project entry barriers with a portal for newcomers

Automatically learning semantic features for defect prediction

Program synthesis using natural language

Augmenting API documentation with insights from stack overflow

On the "naturalness" of buggy code

Disseminating architectural knowledge on open-source projects: a case study of the book "architecture of open-source applications"

On the limits of mutation reduction strategies

Reducing combinatorics in GUI testing of android applications

Missing data imputation based on low-rank recovery and semi-supervised regression for software effort estimation

A comparison of 10 sampling algorithms for configurable systems

Angelix: scalable multiline program patch synthesis via symbolic analysis

Exploring language support for immutability

BigDebug: debugging primitives for interactive big data processing in spark

Are "non-functional" requirements really non-functional?: an investigation of non-functional requirements in practice

Behavioral log analysis with statistical guarantees

Automated partitioning of android applications for trusted execution environments

Building a theory of job rotation in software engineering from an instrumental case study

Quality experience: a grounded theory of successful agile projects without dedicated testers

IntEQ: recognizing benign integer overflows via equivalence checking across multiple precisions

Scalable thread sharing analysis

Improving refactoring speed by 10X

Release planning of mobile apps based on user reviews

Performance issues and optimizations in JavaScript: an empirical study

Belief & evidence in empirical software engineering

Guiding dynamic symbolic execution toward unverified program executions

Termination-checking for LLVM peephole optimizations

An empirical study of practitioners' perspectives on green software engineering

Work practices and challenges in pull-based development: the contributor's perspective

Cross-project defect prediction using a connectivity-based unsupervised classifier

SWIM: synthesizing what i mean: code search and idiomatic snippet synthesis

From word embeddings to document similarities for improved information retrieval in software engineering

Code anomalies flock together: exploring code anomaly agglomerations for locating design problems

Identifying and quantifying architectural debt

Comparing white-box and black-box test prioritization

MobiPlay: a remote execution based record-and-replay tool for mobile applications

Multi-objective software effort estimation

Featured model-based mutation analysis

An analysis of the search spaces for generate and validate patch generation systems

The evolution of C programming practices: a study of the Unix operating system 1973--2015

Debugging for reactive programming

Probing for requirements knowledge to stimulate architectural thinking

Efficient large-scale trace checking using mapreduce

Jumping through hoops: why do Java developers struggle with cryptography APIs?

The challenges of staying together while moving fast: an exploratory study

Code review quality: how developers see it

Nomen est omen: exploring and exploiting similarities between argument and parameter names

Fixing deadlocks via lock pre-acquisitions

SourcererCC: scaling code clone detection to big-code

Toward a framework for detecting privacy policy violations in android application code

Reliability of Run-Time Quality-of-Service evaluation using parametric model checking

Grounded theory in software engineering research: a critical review and guidelines

Synthesizing framework models for symbolic execution

Finding and analyzing compiler warning defects

Automated energy optimization of HTTP requests for mobile applications

Automated parameter optimization of classification techniques for defect prediction models

Cross-supervised synthesis of web-crawlers

Learning API usages from bytecode: a statistical approach

Using (bio)metrics to predict code quality online

Decoupling level: a new metric for architectural maintenance complexity

How does regression test prioritization perform in real-world software evolution?

VDTest: an automated framework to support testing for virtual devices

A practical guide to select quality indicators for assessing pareto-based search algorithms in search-based software engineering

Feature-model interfaces: the highway to compositional analyses of highly-configurable systems

PAC learning-based verification and model synthesis

An empirical study on the impact of C++ lambdas and programmer experience

Revisit of automatic debugging via human focus-tracking analysis

Risk-driven revision of requirements models

Feedback-directed instrumentation for deployed JavaScript applications

Finding security bugs in web applications using a catalog of access control patterns

The sky is not the limit: multitasking across GitHub projects

Revisiting code ownership and its relationship with software quality in the scope of modern code review

Floating-point precision tuning using blame analysis

Coverage-driven test code generation for concurrent classes

Understanding asynchronous interactions in full-stack JavaScript

Mining sandboxes

Optimizing selection of competing services with probabilistic hierarchical refinement

Type-aware concolic testing of JavaScript programs

iDice: problem identification for emerging issues

Too long; didn't watch!: extracting relevant fragments from software development video tutorials

AntMiner: mining more bugs by reducing noise interference

Automatic model generation from documentation for Java API functions

CUSTODES: automatic spreadsheet cell clustering and smell detection using strong and weak features

The impact of test case summaries on bug fixing performance: an empirical investigation

Automated test suite generation for time-continuous simulink models

How does the degree of variability affect bug finding?

StubDroid: automatic inference of precise data-flow summaries for the android framework

Understanding and fixing multiple language interoperability issues: the C/Fortran case

RETracer: triaging crashes by reverse execution from partial memory dumps

Discovering "unknown known" security requirements

DoubleTake: fast and precise error detection via evidence-based dynamic analysis

Reference hijacking: patching, protecting and analyzing on unmodified and non-rooted android devices

Quantifying and mitigating turnover-induced knowledge loss: case studies of chrome and a project at avaya

Crowdsourcing program preconditions via a classification game

Locking discipline inference and checking

Shadow of a doubt: testing for divergences between software versions

Cross-supervised synthesis of web-crawlers

Content Provider	ACM Digital Library
Author	Omari, Adi Yahav, Eran Shoham, Sharon
Abstract	A web-crawler is a program that automatically and systematically tracks the links of a website and extracts information from its pages. Due to the different formats of websites, the crawling scheme for different sites can differ dramatically. Manually customizing a crawler for each specific site is time consuming and error-prone. Furthermore, because sites periodically change their format and presentation, crawling schemes have to be manually updated and adjusted. In this paper, we present a technique for automatic synthesis of web-crawlers from examples. The main idea is to use hand-crafted (possibly partial) crawlers for some websites as the basis for crawling other sites that contain the same kind of information. Technically, we use the data on one site to identify data on another site. We then use the identified data to learn the website structure and synthesize an appropriate extraction scheme. We iterate this process, as synthesized extraction schemes result in additional data to be used for re-learning the website structure. We implemented our approach and automatically synthesized 30 crawlers for websites from nine different categories: books, TVs, conferences, universities, cameras, phones, movies, songs, and hotels.
Starting Page	368
Ending Page	379
Page Count	12
File Format	PDF
ISBN	9781450339001
DOI	10.1145/2884781.2884842
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2016-05-14
Publisher Place	New York
Access Restriction	Subscribed
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in