Gse_Branching_Vs

class: middle

# Welcome to Advanced Software Engineering

#### Global Software Engineering in Open Source and its Effect on Quality

Helge Pfeiffer, Associate Professor,<br>
[Research Center for Government IT](https://www.itu.dk/forskning/institutter/institut-for-datalogi/forskningscenter-for-offentlig-it),<br>
[IT University of Copenhagen, Denmark](https://www.itu.dk)<br>
`ropf@itu.dk`

---

## Learning Objectives

After this session the student will be able to:

* Understand how VCS histories and issue tracker data is collected for  research.
  * Analyze the history from Git repositories and the Jira issue tracker.
  * Apply scripts and programs in various languages to clean and pre-process the exported data.
  * Create scripts and programs to analyze Git VCS and Jira issue tracker data  to investigate certain research questions.
  * Interpret analysis results to either better understand current practices or to suggest actionable changes of current practices in software engineering.

---

# What are we doing today?

* Discussion of the papers that you read
  * Brief recap of branching strategies
  * Research in groups to investigate key result of the paper

---

## You all read the two following papers:

* E. Shihab et al. ["The Effect of Branching Strategies on Software Quality"](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/shihab-esem-2012.pdf)
  * M. Cataldo et al. ["Software Dependencies, Work Dependencies, and Their Impact on Failures"](https://www.researchgate.net/profile/Jeffrey_Roberts2/publication/220070820_Software_Dependencies_Work_Dependencies_and_Their_Impact_on_Failures/links/0046352651c19d5aef000000/Software-Dependencies-Work-Dependencies-and-Their-Impact-on-Failures.pdf))

---

## Discussion

What did you take away from the papers?

#### a) "The Effect of Branching Strategies on Software Quality"

* ...
  * ...

---

## Discussion

What did you take away from the papers?

#### b) "Software Dependencies, Work Dependencies, and Their Impact on Failures"

* ...
  * ...

---

## Discussion

#### What research method is used in the papers?

* ...
  * ...

<!--
  * empirical quantitative study
  * collect development data from VCS history
    - Not Git likely!
    - Type of change (development/branching)
    - on which branch am I?

* binaries for Windows Vista & 7
  * Issues reports

- meaning of change, am I a bug fix, etc.
-->

---

## Discussion

#### What are the key results?

- a) "The Effect of Branching Strategies on Software Quality"

* ...
    * ...
  - b) "Software Dependencies, Work Dependencies, and Their Impact on Failures"
    * ...
    * ...

---

## Discussion

#### Observations

* ...
    * ...

---

## Recap: Git Branching

### What is a Branch?

* Git stores as a series of snapshots.
  * Commit objects contain a pointer amongst others a snapshot of the staged content and zero or more pointers to direct parents of this commit:
--

* First commit: _zero_ parents
    * Normal commit: _one_ parent
    * Merging commit: _multiple_ parents

---

### What is a Branch?

A _branch_ in Git is simply a pointer to a commit. The default branch name is `master`. Every time you commit, the pointer moves forward automatically.

![](http://git-scm.com/figures/18333fig0303-tn.png)

---

### Recap: Branching Strategies

> “We are a team of four senior developers (by which I mean we’re all over 40 with 20+ years each of development experience) and not one of us has had a positive experience in the past with branching the mainline... The branch is easy - it’s the merge at the end that’s painful”.
  >
  >  [Phillips et al. _"Branching and Merging: An Investigation into Current Version Control Practices"_](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.278.6065&rep=rep1&type=pdf)

---

#### Long-Running Branches/Branch by Release

![](http://git-scm.com/figures/18333fig0318-tn.png)

![](http://git-scm.com/figures/18333fig0319-tn.png)

---

#### Topic Branches/Branch by Purpose

A topic branch is a short-lived branch that you create and use for a single particular feature or related work.

![](http://git-scm.com/figures/18333fig0320-tn.png)

---

#### Topic Branches/Branch by Purpose

After merging `iss91v2` and `dumbidea` and throwing away the original `iss91` branch (losing commits C5 and C6) the repository's history looks as in the following:

---

#### Git-Flow

Read on it here: http://nvie.com/posts/a-successful-git-branching-model/

---

#### No Branches/Trunk-based Development

> Almost all development occurs at the “head” of the repository, not on branches. This helps identify integration problems early and minimizes the amount of merging work needed. It also makes it much easier and faster to push out security fixes.
  >
  > Henderson [_"Software Engineering at Google"_](https://arxiv.org/pdf/1702.01715.pdf)

##### Release Branches

##### No Branches

> A release typically starts in a fresh workspace, by syncing to the change number of the latest “green” build (i.e. the last change for which all the automatic tests passed), and making a release branch. The release engineer can select additional changes to be “cherry-picked”, i.e. merged from the main branch onto the release branch. Then the software will be rebuilt from scratch and the tests are run. If any tests fail, additional changes are made to fix the failures and those additional changes are cherry-picked onto the release branch, after which the softwarewillberebuiltandthetestsrerun. Whenthetestsallpass,thebuiltexecutable(s)and data file(s) are packaged up. All of these steps are automated so that the release engineer need only run some simple commands, or even just select some entries on a menu-driven UI, and choose which changes (if any) to cherry pick.
  >
  > Henderson [_"Software Engineering at Google"_](https://arxiv.org/pdf/1702.01715.pdf)

<!-- ---

#### Other Branching Models

* **Github-Flow**: http://scottchacon.com/2011/08/31/github-flow.html
  * Platform Branches http://www.creativebloq.com/web-design/choose-right-git-branching-strategy-121518344
  ![]http://cdn.mos.cms.futurecdn.net/4c32e6c4a3a45756ab06b9f54b904f69-650-80.jpg)
 -->
<!--
* New Feature Version Branches: https://ustwo.com/blog/branching-strategies-with-git/
  ![](https://usweb-cdn.ustwo.com/ustwo-production/uploads/2012/03/ustwo-branching-branching1.jpg)
  * http://www.kumaranuj.com/2015/11/gi-branching-strategies.html

-->

---

## Which Branching Model does Microsoft Use?

In the paper [_"The Effect of Branching Strategies on Software Quality"_](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/shihab-esem-2012.pdf) the authors describe how Microsoft used branches when building Windows 7 and Windows Vista.

Which branching model is it?

* ...
    * ...

---

## Let's do some research!

![](https://thumbs.gfycat.com/WhirlwindGrimyAztecant-max-1mb.gif)

---

## Relation between branching and SW quality?

> We  find  that,  indeed,  branching  does  have  an effect  on  software  quality...
  >
  > E. Shihab et al. ["The Effect of Branching Strategies on Software Quality"](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/shihab-esem-2012.pdf)

* Is that true for open-source projects too?
  * That is, for software that is developed in a globally distributed manner, can we find an effect of branching to defects?

---

## Research Question

#### Is there a correlation between branching activity and software quality?

* _Branching activity_: Let's consider it as the amount of merge commits. This is what Shihab et al. and Phillips et al. describe as the real issue.
  * _Software quality_: Let's consider the amount of reported defects as proxy for software quality. That is, we consider tickets that are labeled as bugs in an issue tracker as defect reports.

---