HOME

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
bsr-ws2016:lectures [2016/10/14 15:37]
nour.assy
bsr-ws2016:lectures [2017/02/20 14:58]
nour.assy [Lectures]
Line 5: Line 5:
 Lectures will be given by the [[bsr-ws2016:​speakers|invited speakers and the BSR senior researchers]]. Lectures will be given by the [[bsr-ws2016:​speakers|invited speakers and the BSR senior researchers]].
 <WRAP justify> <WRAP justify>
 +
 +  * <​html><​font color="​CC0909"><​b>​Wil van der Aalst</​b></​font>​ on <font color="​blue"><​b>​Making Sense From Software Using Process Mining</​b></​font></​html>​
 +**__Short abstract__**:​ Software-related problems have an incredible impact on society, organizations,​ and users that increasingly rely on information technology. Since software is evolving and operates in a changing environment,​ one cannot anticipate all problems at design-time. We propose to use process mining to analyze software in its natural habitat. Process mining aims to bridge the gap between model-based process analysis methods such as simulation and other business process management techniques on the one hand and data-centric analysis methods such as machine learning and data mining on the other. It provides tools and techniques for automated process model discovery, conformance checking, data-driven model repair and extension, bottleneck analysis, and prediction based on event log data. Process discovery techniques can be used to capture the real behavior of software. Conformance checking techniques can be used to spot deviations. The alignment of models and real software behavior can be used to predict problems related to performance or conformance. Recent developments in process mining and the instrumentation of software make this possible. This lecture provides pointers to the state-of-the-art in process mining and its application to software.
 +
 +  * <​html><​font color="​CC0909"><​b>​Jack van Wijk</​b></​font>​ on <font color="​blue"><​b>​Introduction to Data Visualization</​b></​font></​html>​
 +**__Short abstract__**:​ Data Visualization concerns the use of interactive computer graphics to obtain insight in large amounts of data. The aim is to exploit the unique capabilities of the human visual system to detect patterns, structures, and irregularities,​ and to enable experts to formulate new hypotheses, confirm the expected, and to discover the unexpected. In this lecture an overview of the field is given, illustrated with examples of work from Eindhoven, covering a variety of different data and application domains. The focus is on information visualization and visual analytics. We study how large amounts of abstract data, such as tables, hierarchies,​ and networks can be represented and interacted with. In many cases, combinations of such data have to be dealt with, and also, the data is often dynamic, which brings another big challenge. Typical use cases are how to understand large software systems, how to analyze thousands of medicine prescriptions,​ and how to see patterns in huge telecom datasets. In visual analytics, the aim is to integrate methods from statistics, machine learning, and data mining, as well as to support data types such as text and multimedia, and to support the full process from data acquisition to presentation.
  
   * <​html><​font color="​CC0909"><​b>​Margaret-Anne Storey</​b></​font>​ on <font color="​blue"><​b>​Beyond Mixed Methods: Why Big Data Needs Thick Data</​b></​font></​html>​   * <​html><​font color="​CC0909"><​b>​Margaret-Anne Storey</​b></​font>​ on <font color="​blue"><​b>​Beyond Mixed Methods: Why Big Data Needs Thick Data</​b></​font></​html>​
Line 21: Line 27:
 We will discuss these questions through specific examples and case studies. We will discuss these questions through specific examples and case studies.
  
-  * <​html><​font color="​CC0909"><​b>​Wil van der Aalst</​b></​font>​ on <font color="​blue"><​b>​Making Sense From Software Using Process Mining</​b></​font></​html>​ +  * <​html><​font color="​CC0909"><​b>​Frits Vaandrager</​b></​font>​ on <font color="​blue"><​b>​Active Learning of Automata</​b></​font></​html>​ 
-**__Short abstract__**: ​Software-related problems have an incredible impact on society, organizations,​ and users that increasingly rely on information technology. Since software ​is evolving and operates in changing environment,​ one cannot anticipate all problems at design-time. We propose to use process mining to analyze software in its natural habitat. Process mining aims to bridge the gap between model-based process analysis methods such as simulation and other business process management techniques on the one hand and data-centric analysis methods such as machine ​learning and data mining on the other. It provides tools and techniques for automated process model discovery, conformance checking, data-driven model repair and extension, bottleneck analysis, and prediction based on event log data. Process discovery techniques can be used to capture the real behavior ​of software. ​Conformance checking techniques can be used to spot deviations. The alignment ​of models and real software behavior can be used to predict problems related to performance or conformance. Recent developments ​in process mining and the instrumentation of software make this possible. This lecture provides pointers to the state-of-the-art in process mining ​and its application to software.+**__Short abstract__**: ​Active automata learning ​is emerging as highly effective technique for obtaining state machine ​models ​of software ​componentsIn this talk, I will give a survey ​of recent progress ​in the field, highlight applications, ​and identify some remaining research challenges.
  
-  * <​html><​font color="​CC0909"><​b>​Georgios Gousios</​b></​font>​ on <font color="​blue"><​b>​Mining GitHub ​for fun and profit</​b></​font></​html>​ +  * <​html><​font color="​CC0909"><​b>​Arie van Deursen</​b></​font>​ on <font color="​blue"><​b>​Exceptional Logging</​b></​font></​html>​ 
-**__Short abstract__**:​ Modern ​organizations use telemetry ​and process ​data to make software production more efficientConsequentlysoftware ​engineering ​is an increasingly data-centered scientific field. With over 30 million repositories ​and 10 million usersGitHub ​is currently ​the largest code hosting site in the worldSoftware engineering researchers have been drawn to GitHub due to this popularityas well as its integrated social features ​and the metadata that can be accessed through its API. To make research with GitHub data approachablewe created ​the GHTorrent project, a scalableoff-line mirror ​of all data offered through ​the GitHub APIIn our lecturewe will discuss ​the GHTorrent project in detail ​and present insights drawn from using this dataset in various research works.+**__Short abstract__**:​ Incorrect error handling is a major cause for software system crashes. Luckily, the majority of these crashes lead to useful log data that can helped to analyze the root cause. In this presentation we explore exception handling practices from different perspectives,​ with the ultimate goal of making error handling less error prone, prioritizing error handling based on their occurrences in log data, and automating the fixing process of error handling as far as possible. We cover a range of research methods we used in our studies, including static analysis, repository mining, genetic algorithms, log file analytics, and qualitative analysis of surveys. 
 + 
 +  * <​html><​font color="​CC0909"><​b>​Robert DeLine</​b></​font>​ on <font color="​blue"><​b>​Supporting Data-Centered Software Development ​</​b></​font></​html>​ 
 +**__Short abstract__**:​ Modern ​software consists of both logic and data. Some use of the data is "back stage",​ invisible ​to customersFor exampleteams analyze service logs to make engineering ​decisions, like assigning bug and feature prioritiesor use dashboards to monitor service performance. Other use of data is "on stage" (that is, part of the user experience),​ for example, the graph algorithms of the Facebook feed, the machine learning behind web search results, or the signal processing inside fitness braceletsToday, working with data is typically assigned ​to the specialized role of the data scientista discipline with its own tools, skills ​and knowledge. In the first part of talkI'll describe some empirical studies of the emerging role of data scientiststo convey their current work practice. 
 + 
 +Over time, working with data is likely to change from a role to a skill. As precedentsoftware testing was originally the responsibility ​of the role of software testersEventually, the prevalence of test-driven development ​and unit tests turned testing into a skill that many developers practice. Will the same be true of the role of data scientists? Is it possible to create tools to allow a wide range of developers (or even end users) to analyze data and to create data-centered algorithms? The second part the talk will demo emerging tools that take initial steps toward democratizing data science.
  
   * <​html><​font color="​CC0909"><​b>​Mark van den Brand</​b></​font>​ on <font color="​blue"><​b>​Challenges in Automotive Software Development Running on Big Software</​b></​font></​html>​   * <​html><​font color="​CC0909"><​b>​Mark van den Brand</​b></​font>​ on <font color="​blue"><​b>​Challenges in Automotive Software Development Running on Big Software</​b></​font></​html>​
Line 53: Line 64:
 driving. driving.
  
 +  * <​html><​font color="​CC0909"><​b>​Georgios Gousios</​b></​font>​ on <font color="​blue"><​b>​Mining GitHub for fun and profit</​b></​font></​html>​
 +**__Short abstract__**:​ Modern organizations use telemetry and process data to make software production more efficient. Consequently,​ software engineering is an increasingly data-centered scientific field. With over 30 million repositories and 10 million users, GitHub is currently the largest code hosting site in the world. Software engineering researchers have been drawn to GitHub due to this popularity, as well as its integrated social features and the metadata that can be accessed through its API. To make research with GitHub data approachable,​ we created the GHTorrent project, a scalable, off-line mirror of all data offered through the GitHub API. In our lecture, we will discuss the GHTorrent project in detail and present insights drawn from using this dataset in various research works.
  
-  * <​html><​font color="​CC0909"><​b>​Jack van Wijk</​b></​font>​ on <font color="​blue"><​b>​Introduction to Data Visualization</​b></​font></​html>​ +  * <​html><​font color="​CC0909"><​b>​Jaco van de Pol</​b></​font>​ on <font color="​blue"><​b>​Scalable Model Analysis</​b></​font></​html>​ 
-**__Short abstract__**: ​Data Visualization concerns the use of interactive computer graphics to obtain insight in large amounts of dataThe aim is to exploit the unique capabilities of the human visual system to detect patternsstructures, and irregularities,​ and to enable experts to formulate new hypotheses, confirm ​the expectedand to discover ​the unexpected. In this lecture an overview ​of the field is given, illustrated with examples ​of work from Eindhoven, covering ​variety of different data and application domains. The focus is on information visualization and visual analytics. We study how large amounts of abstract data, such as tables, hierarchies,​ and networks can be represented and interacted with. In many cases, combinations of such data have to be dealt with, and also, the data is often dynamic, which brings another big challenge. Typical ​use cases are how to understand ​large software ​systemshow to analyze thousands ​of medicine prescriptions,​ and how to see patterns in huge telecom datasetsIn visual analytics, the aim is to integrate methods from statisticsmachine learning, and data mining, as well as to support data types such as text and multimedia, and to support the full process from data acquisition to presentation.+**__Short abstract__**: ​Software is a complex productThis holds already for its static structurebut even more so for its dynamic behaviour. When considering Big Software on the Run, the role of models ​is changing fast: instead ​of using them as blueprint (like engineers) we now use models ​to understand ​running ​software ​(like biologists). More extremelywe are now using machine learning ​to obtain models ​of complex software systems automaticallyHoweveradapting a classic motto, we should “spend more time on the analysis of modelsthan on collecting logs, and learning ​and visualising models” (1).
  
-  * <​html><​font color="​CC0909"><​b>​Robert DeLine</​b></​font>​ on <font color="​blue"><​b>​Supporting Data-Centered Software Development </​b></​font></​html>​ +We will discuss algorithms ​and tools for studying models ​of the dynamic behaviour of systems. Since their complex behaviour ​is essentially modeled as a giant graphwe will review various high performance graph algorithms. ​In particularwe will cover LTL as a logic to specify properties ​of system runs, and symbolic and multi-core model checking as the scalable means to analyse large models. We will illustrate this with Petri Nets modelling software systems, and timed automata modelling biological systems.
-**__Short abstract__**:​ Modern software consists of both logic and data. Some use of the data is "back stage"invisible to customers. For example, teams analyze service logs to make engineering decisions, like assigning bug and feature priorities, or use dashboards to monitor service ​performance. Other use of data is "on stage" (that is, part of the user experience),​ for example, the graph algorithms ​of the Facebook feed, the machine learning behind web search results, or the signal processing inside fitness braceletsTodayworking with data is typically assigned ​to the specialized role of the data scientist, a discipline ​with its own toolsskills ​and knowledge. In the first part of talk, I'll describe some empirical studies of the emerging role of data scientists, to convey their current work practice.+
  
-Over timeworking ​with data is likely ​to change from a role to a skill. As a precedentsoftware testing was originally ​the responsibility ​of the role of software testersEventually, the prevalence ​of test-driven development ​and unit tests turned testing into skill that many developers practiceWill the same be true of the role of data scientists? Is it possible to create tools to allow wide range of developers (or even end users) ​to analyze data and to create data-centered algorithms? The second part the talk will demo emerging tools that take initial steps toward democratizing data science.+(1) variation on "Spend more time working ​on code that analyzes the meaning of metrics, than code that collects, moves, stores and displays metrics - Adrian Cockcrof",​ cited by H. Hartmann in Communications of the ACM 59(7), July 2016 
 + 
 +  * <​html><​font color="​CC0909"><​b>​Marieke Huisman</​b></​font>​ on <font color="​blue"><​b>​Reliable Concurrent Software</​b></​font></​html>​ 
 +**__Short abstract__**:​ Concurrent software ​is inherently error-prone,​ due to the possible interactions and subtle interplays between the parallel computations. As a resulterror prediction and tracing ​the sources ​of errors often is difficult. In particular, rerunning an execution with exactly ​the same input might not lead to the same error 
 +To improve this situationwe need techniques that can provide guarantees about the behaviour ​of a concurrent program. In this lecture, we discuss an approach based on program annotations. The program annotations describe locally what part of the memory are affected by a thread, ​and what the expected behaviour of thread isFrom the local program annotations,​ conclusions can be drawn about the global behaviour ​of a concurrent application.  
 +In this lecture, we discuss various techniques ​to verify such annotations. If a high-correctness guarantee is needed, static program verification techniques can be used. However, in many cases, checking at run-time ​that the annotations are not violated is sufficient. We discuss both approaches, and we show in particular what are the challenges to use them in a concurrent setting.
  
   * <​html><​font color="​CC0909"><​b>​Bram Adams</​b></​font>​ on <font color="​blue"><​b>​How NOT to analyze your release process </​b></​font></​html>​   * <​html><​font color="​CC0909"><​b>​Bram Adams</​b></​font>​ on <font color="​blue"><​b>​How NOT to analyze your release process </​b></​font></​html>​
 **__Short abstract__**:​ The release engineering process is the process that brings high quality code changes from a developer’s workspace to the end user, encompassing code change integration,​ continuous integration,​ build system specifications,​ infrastructure-as-code,​ deployment and release. Recent practices of continuous delivery, which bring new content to the end user in days or hours rather than months or years, require companies to closely monitor the progress of their release engineering process by mining the repositories involved in each phase, such as their version control system, bug/​reviewing repositories and deployment logs. This tutorial presents the six major phases of the release engineering pipeline, the main repositories that are available for analysis in each phase, and three families of mistakes that could invalidate empirical analysis of the release process. Even if you are not working on release engineering,​ the mistakes discussed in this tutorial can impact your research results as well! **__Short abstract__**:​ The release engineering process is the process that brings high quality code changes from a developer’s workspace to the end user, encompassing code change integration,​ continuous integration,​ build system specifications,​ infrastructure-as-code,​ deployment and release. Recent practices of continuous delivery, which bring new content to the end user in days or hours rather than months or years, require companies to closely monitor the progress of their release engineering process by mining the repositories involved in each phase, such as their version control system, bug/​reviewing repositories and deployment logs. This tutorial presents the six major phases of the release engineering pipeline, the main repositories that are available for analysis in each phase, and three families of mistakes that could invalidate empirical analysis of the release process. Even if you are not working on release engineering,​ the mistakes discussed in this tutorial can impact your research results as well!
  
-For more background: http://​mcis.polymtl.ca/​publications/​2016/​fose.pdf +For more background: http://​mcis.polymtl.ca/​publications/​2016/​fose.pdf ​
- +
- +
-  * <​html><​font color="​CC0909"><​b>​Frits Vaandrager</​b></​font>​ on <font color="​blue"><​b>​Active Learning of Automata</​b></​font></​html>​ +
-**__Short abstract__**:​ Active automata learning is emerging as a highly effective technique for obtaining state machine models of software components. In this talk, I will give a survey of recent progress in the field, highlight applications,​ and identify some remaining research challenges. +
- +
-   +
- +
-  ​+
  
   * <​html><​font color="​CC0909"><​b>​Zekeriya Erkin</​b></​font>​ on <font color="​blue"><​b>​Software Analysis: Anonymity and Cryptography for Privacy</​b></​font></​html>​   * <​html><​font color="​CC0909"><​b>​Zekeriya Erkin</​b></​font>​ on <font color="​blue"><​b>​Software Analysis: Anonymity and Cryptography for Privacy</​b></​font></​html>​
 **__Short abstract__**:​ Validation in a big software system can be managed by dynamically analysing its behaviour. A software system in use occasionally reports information to developers about its status in the form of event logs. Developers use these information to detect flaws in the software and to improve its performance with the help of process mining techniques. Process mining generates process models from the collected events or checks the conformance of these events with an existing process model to identify flaws in the software. Algorithms in process mining to discover such process models rely on software behaviour through real event logs and indeed very useful for software validation. However, the existence of some sensitive information in the collected logs may become a threat for the privacy of users as seen in practice. In this talk, we present privacy enhancing technologies (PETs) for privacy-preserving algorithms for software modelling. We focus on different approaches, namely anonymization techniques and deploying advanced cryptographic tools such as homomorphic encryption for the protection of sensitive data in logs during software analysis. As a very new field of research, we introduce a number of challenges yet to be solved and discuss different aspects of the challenge in terms of level of privacy, utility and overhead introduced by deploying PETs. **__Short abstract__**:​ Validation in a big software system can be managed by dynamically analysing its behaviour. A software system in use occasionally reports information to developers about its status in the form of event logs. Developers use these information to detect flaws in the software and to improve its performance with the help of process mining techniques. Process mining generates process models from the collected events or checks the conformance of these events with an existing process model to identify flaws in the software. Algorithms in process mining to discover such process models rely on software behaviour through real event logs and indeed very useful for software validation. However, the existence of some sensitive information in the collected logs may become a threat for the privacy of users as seen in practice. In this talk, we present privacy enhancing technologies (PETs) for privacy-preserving algorithms for software modelling. We focus on different approaches, namely anonymization techniques and deploying advanced cryptographic tools such as homomorphic encryption for the protection of sensitive data in logs during software analysis. As a very new field of research, we introduce a number of challenges yet to be solved and discuss different aspects of the challenge in terms of level of privacy, utility and overhead introduced by deploying PETs.
  
-  * <​html><​font color="​CC0909"><​b>​Marieke Huisman</​b></​font>​ on <font color="​blue"><​b>​Reliable Concurrent Software</​b></​font></​html>​ 
-**__Short abstract__**:​ Concurrent software is inherently error-prone,​ due to the possible interactions and subtle interplays between the parallel computations. As a result, error prediction and tracing the sources of errors often is difficult. In particular, rerunning an execution with exactly the same input might not lead to the same error. ​ 
-To improve this situation, we need techniques that can provide guarantees about the behaviour of a concurrent program. In this lecture, we discuss an approach based on program annotations. The program annotations describe locally what part of the memory are affected by a thread, and what the expected behaviour of a thread is. From the local program annotations,​ conclusions can be drawn about the global behaviour of a concurrent application. ​ 
-In this lecture, we discuss various techniques to verify such annotations. If a high-correctness guarantee is needed, static program verification techniques can be used. However, in many cases, checking at run-time that the annotations are not violated is sufficient. We discuss both approaches, and we show in particular what are the challenges to use them in a concurrent setting. 
- 
-  * <​html><​font color="​CC0909"><​b>​Jaco van de Pol</​b></​font>​ on <font color="​blue"><​b>​Scalable Model Analysis</​b></​font></​html>​ 
-**__Short abstract__**:​ Software is a complex product. This holds already for its static structure, but even more so for its dynamic behaviour. When considering Big Software on the Run, the role of models is changing fast: instead of using them as a blueprint (like engineers) we now use models to understand running software (like biologists). More extremely, we are now using machine learning to obtain models of complex software systems automatically. However, adapting a classic motto, we should “spend more time on the analysis of models, than on collecting logs, and learning and visualising models” (1). 
- 
-We will discuss algorithms and tools for studying models of the dynamic behaviour of systems. Since their complex behaviour is essentially modeled as a giant graph, we will review various high performance graph algorithms. In particular, we will cover LTL as a logic to specify properties of system runs, and symbolic and multi-core model checking as the scalable means to analyse large models. We will illustrate this with Petri Nets modelling software systems, and timed automata modelling biological systems. 
- 
-(1) variation on "Spend more time working on code that analyzes the meaning of metrics, than code that collects, moves, stores and displays metrics - Adrian Cockcrof",​ cited by H. Hartmann in Communications of the ACM 59(7), July 2016 
 </​WRAP>​ </​WRAP>​
 ---- ----
Line 94: Line 92:
 <WRAP group> <WRAP group>
 <WRAP class center quarter column> <WRAP class center quarter column>
-[[https://​www.tue.nl/​|{{ :​bsr-ws2016:​tue.png?​90 }}]]+[[https://​www.tue.nl/​|{{:​bsr-ws2016:​tue.png?​80 }}]]
 </​WRAP>​ </​WRAP>​
  
 <WRAP class center quarter column> <WRAP class center quarter column>
-[[http://​tudelft.nl/​en/​|{{ :​bsr-ws2016:​tud.png?​100 }}]]+[[http://​tudelft.nl/​en/​|{{:​bsr-ws2016:​tud.png?​100 }}]]
 </​WRAP>​ </​WRAP>​
  
 <WRAP class center quarter column> <WRAP class center quarter column>
-[[https://​www.utwente.nl/​en/​|{{ :​bsr-ws2016:​ut.png?​200 }}]]+[[https://​www.utwente.nl/​en/​|{{:​bsr-ws2016:​ut.png?​110 }}]]
 </​WRAP>​ </​WRAP>​
  
 <WRAP class center quarter column> <WRAP class center quarter column>
-[[https://​www.3tu.nl/​nirict/​en/​|{{ :​bsr-ws2016:​nirict.jpg?​150 }}]]+[[https://​www.3tu.nl/​nirict/​en/​|{{:​bsr-ws2016:​nirict.jpg?​110 }}]]
 </​WRAP>​ </​WRAP>​
 </​WRAP>​ </​WRAP>​