Wssebranchingvsquality

## Global Software Engineering in Open Source and its Effect on Quality @ [WSSE22](https://cs.hse.ru/wsse/)

Helge Pfeiffer, Assistant Professor,<br>
[Research Center for Government IT](https://www.itu.dk/forskning/institutter/institut-for-datalogi/forskningscenter-for-offentlig-it),<br>
[IT University of Copenhagen, Denmark](https://www.itu.dk)<br>
`ropf@itu.dk`

---

# Who am I?

* Dipl-Inf. in Software Engineering from Friedrich-Schiller Universität Jena
  * PhD in Software Engineering from ITU
  * Software engineer at the Danish Meteorological Institute
  * Lecturer at Cphbusiness
  * Since Jan. 2019 Assist. Prof. at ITU, working on Software Quality, Software Quality Metrics, and Technical Debt

---

# What are we doing today?

* Discussion of the paper that you read
  * Brief recap of branching strategies
  * Mini research project in groups 
    - we investigate the key result of the paper

---

## Learning Objectives

After this session the student will be able to:

* Understand how VCS histories and issue tracker data is collected for  research.
  * Analyze the history from Git repositories and the Jira issue tracker.
  * Apply scripts and programs in various languages to clean and pre-process the exported data.
  * Create scripts and programs to analyze Git VCS and Jira issue tracker data  to investigate certain research questions.
  * Interpret analysis results to either better understand current practices or to suggest actionable changes of current practices in software engineering.

---

## You all read the paper:

* E. Shihab et al. ["The Effect of Branching Strategies on Software Quality"](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/shihab-esem-2012.pdf)

---

## Discussion

#### What did you take away from the paper?

* ...
  * ...

---

## Discussion

#### Which research method is used in the paper?

* ...
  * ...

---

## Discussion

#### What are the key results?

* ...
    * ...

---

## Discussion

#### Observations

* ...
    * ...

---

## Branching in Git?

### What is a Branch?

* Git stores as a series of snapshots.

---

### What is a Branch?

* Amongst others, commit objects contain a pointer to zero or more direct parents of this commit:

* First commit: _zero_ parents
    * Normal commit: _one_ parent
    * Merging commit: _multiple_ parents

<tiny> 
See for example <a href="https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell">Chapter 3. Git Branching</a> and <a href="https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain">Chapter 10. Git Internals</a> in the <a href="https://git-scm.com/book/en/v2/">Pro Git book</a>.
</tiny>
---

### What is a Branch?

A _branch_ in Git is simply a pointer to a commit. The default branch name is `master`. Every time you commit, the pointer moves forward automatically.

---

### Branching Strategies?

> “We are a team of four senior developers (by which I mean we’re all over 40 with 20+ years each of development experience) and not one of us has had a positive experience in the past with branching the mainline... The branch is easy - it’s the merge at the end that’s painful”.
  >
  >  [Phillips et al. _"Branching and Merging: An Investigation into Current Version Control Practices"_](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.278.6065&rep=rep1&type=pdf)

---

#### Long-Running Branches/Branch by Release

![](http://git-scm.com/figures/18333fig0318-tn.png)

![](http://git-scm.com/figures/18333fig0319-tn.png)

---

#### Topic Branches/Branch by Purpose

A topic branch is a short-lived branch that you create and use for a single particular feature or related work.

![](http://git-scm.com/figures/18333fig0320-tn.png)

---

#### Topic Branches/Branch by Purpose

After merging `iss91v2` and `dumbidea` and throwing away the original `iss91` branch (losing commits C5 and C6) the repository's history looks as in the following:

---

#### Git-Flow

Read on it here: http://nvie.com/posts/a-successful-git-branching-model/

---

#### No Branches/Trunk-based Development

> Almost all development occurs at the “head” of the repository, not on branches. This helps identify integration problems early and minimizes the amount of merging work needed. It also makes it much easier and faster to push out security fixes.
  >
  > Henderson [_"Software Engineering at Google"_](https://arxiv.org/pdf/1702.01715.pdf)

##### Release Branches

---

##### No Branches

> A release typically starts in a fresh workspace, by syncing to the change number of the latest “green” build (i.e. the last change for which all the automatic tests passed), and making a release branch. The release engineer can select additional changes to be “cherry-picked”, i.e. merged from the main branch onto the release branch. Then the software will be rebuilt from scratch and the tests are run. If any tests fail, additional changes are made to fix the failures and those additional changes are cherry-picked onto the release branch, after which the software will be rebuilt and the tests rerun. When the tests all pass,the built executable(s)and data file(s) are packaged up. All of these steps are automated so that the release engineer need only run some simple commands, or even just select some entries on a menu-driven UI, and choose which changes (if any) to cherry pick.
  >
  > Henderson [_"Software Engineering at Google"_](https://arxiv.org/pdf/1702.01715.pdf)

---

## Which Branching Model does Microsoft Use?

In the paper [_"The Effect of Branching Strategies on Software Quality"_](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/shihab-esem-2012.pdf) the authors describe how Microsoft used branches when building Windows 7 and Windows Vista.

Which branching model is it?

* ...
    * ...

---

## Let's do some research!

![](https://thumbs.gfycat.com/WhirlwindGrimyAztecant-max-1mb.gif)

---

## Relation between branching and SW quality?

> We  find  that,  indeed,  branching  does  have  an effect  on  software  quality...
  >
  > E. Shihab et al. ["The Effect of Branching Strategies on Software Quality"](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/shihab-esem-2012.pdf)

* Is that true for open-source projects too?
  * That is, for software that is developed in a globally distributed manner, can we find an effect of branching on defects?

---

## Research Question

#### Is there a correlation between branching activity and software quality?

* _Branching activity_: Let's consider it as the amount of merge commits. This is what Shihab et al. and Phillips et al. describe as the real issue.
  * _Software quality_: Let's consider the amount of reported defects as proxy for software quality. That is, we consider tickets that are labeled as bugs in an issue tracker as defect reports.

---