Open Source Project Development Practice

Jinpeng Zhang
6 min readApr 23, 2024

--

Purpose

As the maintainer and committer of the open source project TiKV, today I’m going to share some open source project development practice here. As a member of an open source project, and as an employee of open source software company, we have the responsibility to follow open source culture and open source best practice. Also because many new employees who don’t have any open source project experience before joined us, they also need a good practice guide to learn how to work and collaborate in an open source project.

Open and transparency: enough context in Github

It is normal that we discuss some ideas/issues in our internal team’s weekly meetings or project meetings, and then file some PRs to make some code changes. We may have some background and context in our team’s weekly meeting doc.

But this is not good when we are developing an open-source project. Please assume reviewers don’t know any context of this change except information they can fetch from github. Because:

  • 1) This is an open source project, and everyone interested in this project may see your PRs and issues. If there is not enough information in github, developers may need a lot of effort to understand what’s going on for this change, which we call “friction”. The more friction a project has, the harder developers participate in this project. This rule not just work for developers out side of our company, but also for engineers in our company.
  • 2) Memory is good in the short term perspective, but bad in the long term perspective. The author can clearly remember what the background of a code changes in the short term, but it is hard for the author to remember the details in the long term. If there is clear information in github (PR description, issue description and design doc), it would be easier for everyone to pick up the context.

Write an informative issue

An informative issue is good for everybody to understand the background of a problem. Typically, there are some key points contained in an informative issue:

  • The basic description of this issue, what behaviour or error this issue will cause, what is the impact on the project, what is the severity of this issue (critical, major, minor).
  • Typically critical issues may cause service outages temporarily or permanently, wrong results, etc. Major issues may impact the QoS the project provides, like temporarily increasing latency in some cases. Minor issues
  • The second important information about an issue is how this issue is caused. This part describes the mechanism of the issue.
  • The third part of an informative issue is there any work-around from the users’ perspective. This would be helpful if some users or customers encountered this issue.
  • The 4th part of an issue should be the following action or future plan. Is it necessary to introduce a new mechanism or new feature? Or, is it just some engineering defect we need to fix?
An informative issue example

A good design doc on Github matters

If your change is large or significant for the product, you’d better propose a design PR first. A good design can:

  • Let yourself think clearly about the problem and write it down.
  • Let others understand what problem you are trying to resolve, how you will resolve it. This is helpful for reviewers to understand your change and then accelerate PR reviewing progress.

A good design doc usually contains several key parts:

  • The background, what problems you want to resolve, how does this problem exist.
  • Your proposal or solution.
  • Alternatives you have considered, why you chose the current solution.
  • Detail mechanism of your solution.
  • Mechanical description of the new feature or solution.
  • Performance assessment.
  • Compatibility issues (with old version, with other components), operational scenarios (rolling restart, scale-in/out, upgrade, etc), resilience issues (single node failure, network temporarily failure, unplaned restart, etc) you have considered
  • Test plan.

There is a design doc example wrote by me.

PR with good description

Each PR should have a good description instead of just the title, except the PR is a typo fix or minor change.

Here is an anti-case:

Split single large PRs into multiple small ones

Every time I see a large PR with more than 1K lines change, I feel frustrated. One is because I need to understand clearly what this large PR has changed, any potential side-effects has introduced by each line of change. One is because I need to spend a long and focus time to review it, it is easy to get tired and lack of concentration when reviewing a large PR, reviewers usally review less carefully for the 2nd half of such large PR. This is not because they don’t want to review carefully, it is because the amount of time humans can sustain their attention is limited.

There are some reasons that result in single large PRs:

  • The author wants to resolve multiple problems in one single PR. This is a bad case for open source development in terms of efficiency and correctness. The best practice is resolving an atomic issue/problem in one PR. In this way, authors and reviewers can focus on the context of this specified issue, which may let the reviewing process concentrate and efficiently.
  • It is a large project which involves a lot of changes. In this case, please split the large project into multiple tasks. And file dedicated PR for each task.

Manage a series of tasks by a tracking issue

If you have a project containing multiple sub-tasks, please use a tracking issue to describe the project and keep track of the status of all sub-tasks.

There is a tracking issue example: it described what the project want to do, and defined the criteria of the project acomany with the status of sub-tasks list.

Developing efficiency & experience matters: local test

Before we file the PR, we probably need to test the change locally we have made. To make sure PR can pass all CIs for the first time, instead of back and forth modifying our code.

This requires:

  • The project to be easy to test locally and comprehensively, ideally can run all UTs and all integration tests locally with one simple command like “make test” or “make dev” like TiKV did.
  • The whole local tests should not last long, like half an hour or hours. If all local tests could pass in a few minutes, it would provide good experience for all developers. Good development experience is a strong positive feedback to make developers (engineers in-side our company and out-side our company) continuously and happily participate in this project.

Some anti cases

Internal doc link in Github

There are two major issues in the above diagram:

  • This is a large project that changes the mechanism of TiDB schema meta management, it involves a lot of PRs. Typically, the design doc should be provided first.
  • This PR includes an internal link which is only accessible for PingCAP employees. This is not friendly to other open source developers.

Large change without design doc

This is another case with a large change without a design doc.

PRs without description

--

--

Jinpeng Zhang
Jinpeng Zhang

Written by Jinpeng Zhang

Built distributed SQL database TiDB from scratch with other colleagues, focus on building large scale distributed system and high performance engineering team.

No responses yet