Which areaes kernel contributors focused on

I’ve made a new kind of analysis for Linux Kernel contributions. It tells us the contributors, such as IBM, Intel, Oracle and Fujitsu etc., were focusing on what kind of areas/subsystems of kernel to make their contributions.

As we know, Linux Kernel have a lot of different kind of active contributors including Non-profit and corporations. For corporations, there are also lots of different kinds and make the contribution be done on different kernel subsystems. We can simply attribute the corporations to distro vendor, hardware vendor, software vender and IT vendor, and pick up some representative corporations from each class to see what those giants are interesting in and bring them on contributing.

1) Distro Venders which sell Linux Distributions and provide support service

a) Red Hat

  • kernel/trace/:  No surprise that an OS development company uses a lot of effort to focus on how to monitor and debug kernel.
  • arch/[x86|sparc*|ia64|powerpc|x86_64]/:  As we will see, although other companies also cover some parts of arches,  such as x86(Intel), PowerPC(IBM) and ia64(SGI), no one completely covers so many arches as a kernel contributor. A distro vendor should take care of arch code as seriously as platform vender.
  • drivers/[net|char|scsi|media|ata|md|...]/:  Also no surprise, although some hardware vendors provide Linux driver for their products, distro vendors still need to take care of some orphan devices or integration bugs.
  • fs/gfs2/:  Not only as a Distro Vender, Red Hat plays some role of IT vendor too and provide integrated solution for costumers. The Red Hat’s product – GFS is the file system which used for manage cluster servers for enterprise deployment.
  • Red Hat are involved in a lot of other areas to which the contributions are not as shining as those mentioned before, however they are still huge ones compared to the contributions of companies after TOP10 contribution. Details can be referred by the KPS statistic data.

b) Novell

  • sound/pci/: As Novell claimed “SUSE Linux Enterprise Desktop is the market’s only enterprise-quality Linux desktop”, SUSE also aims on desktop market. Providing great sound subsystem seems be consistent to that target.
  • drivers/*/: Also consistent to the desktop market target, Novell provides more driver development efforts than Red Hat. Greg KH as one of the Novell employee costs a big part of his time on Staging Drivers to help Linux Kernel to support more and more new cool drivers.
  • fs/fuse/:  I can’t see obvious reason that why Novell support fuse such dedicated.

2) Hardware Venders which mainly sell hardware for profit.

a) Intel

  • drivers/net/:  Who are the biggest NIC and Wireless NIC vendors on the world? I think Intel must be one of the them.
  • drivers/acpi/:  As the biggest platform vendor, Intel leads the hardware and software development of power management.
  • arch/[i386|ia64|x86_64]/:  Who made the CPUs? As the biggest CPU manufactory, Intel should take care the arches which her CPUs support.

b) Renesas

  • arch/sh/:  SuperH arch’s biggest manufactory supports the sh kernel without any surprise.

c)  Analog Devices

  • arch/blackfin/: Most of her efforts focus on her own platform – blackfin.

3) Software Venders which mainly sell software(except OS) and service for profit.

a) Oracle (before acquiring Sun, Oracle is more like a software vendor)

  • fs/[btrfs|ocfs2|nfs]/: As the biggest OSS contributor of software vendor, Oracle is carrying a lot of OSS projects on. The Linux kernel filesystems are just the typical area Oracle is focusing on. As a database maker, Oracle is enforcing some filesystems of Linux Kernel and keeping creating new filesystems to tie in with their database or middleware products.
  • block/:  No surprise, Oracle should not only spend efforts on fs subsystems and also block IO subsystems to support their database solutions.

b)  Parallels

  • net/[ipv4|ipv6|core]/:  As a virtualization software maker, Paralles’s engineers did a lot of jobs to refine network namespace, which can support network virtualization better.

4) IT Venders which sell whole solutions including hardware, OS, middleware and applications, and of course service for profit.

a) IBM

  • arch/[powerpc|s390|ppc64]/: IBM created those arches and provided Linux kernel which can be run on those platforms.
  • kernel/:  As the No.2 kernel contributor following Red Hat, IBM did a lot of work to core kernel part, such as kernel synchronism, cpu control, kprobe and etc.
  • fs/[cifs|ext3|ext4]/: IBM also hired some active community engineers who work on filesystem area. As we all know, Ted is working as ext3/4 maintainer, IBM employee and TLF consultant.
  • Linux Test Project: Although my statistic analysis only includes kernel source code, the great work of LTP makes me have to mention it. As a synchronous-with-kernel and individual project, LTP keeps updating its test cases for kernel and makes a lot of Linux related IT vendors to have a easy day to do QA work for kernel. The LTP project is a special contribution to Linux Kernel.

b) SGI

  • fs/xfs/:  As a product of SGI, XFS got wonderful support from SGI. A storage solution and HPC vendor taking some efforts on filesystems is totally unsurprising.
  • mm/:  Former SGI employee Christoph Lameter did a lot of contribution to memory management subsystem.
  • arch/ia64/: SGI and HP are the initiators of IA64, thus for IA64, SGI did a lot of work as well as HP, Intel.

c) Fujitsu (my former employer :) )

  • kernel/trace/: As a whole IT solution vendor, Fujitsu are doing a lot of work to enhance Linux Kernel’s trace and debug features to provide customer a more robust, maintainable and higher availabilty IT environment.
  • drivers/pci/:  During integration, Fujitsu enhances drivers to ensure a high quality hardware environment of server.
  • mm/:  Memory controller is maintained by Fujitsu and Google engineers to provide user a more flexible IT environment.

As we can see, corporations are dedicating themselves to kernel contribution for their product lines or services and trying to feed back to community when they gain from community. They are taking the responsibility to make sure the enterprise using components of kernel to be as healthy as customers want.

Let’s see the non-profit contributors are focusing on what areas, which maybe different from that of corporations.

Hobbyists (No one pays them for doing kernel contribution)

  • drivers/media/:  As we know, no commercial companies focused on this area. But hobbyists committed 3818 patches(2.4% of total patches of kernel since Linux-2.6.12) to drivers/media/. That  is the amazing phenomena that desktop users are doing such great works to kernel development.
  • drivers/[net|ide|staging|usb|video]: A lot of hobbyists are taking care of the stuff which enterprise users maybe don’t want to care.
  • arch/[x86|arm]/:  The hobbyists’ most favorite platforms are x86 and arm :)

That’s the brief introduction about who are interesting in what subsystem of kernel.

For detail information about other corporations and non-profit population who are not mentioned here, such as Google, HP, Academics etc. please refer to the statistic.

Kernel Defects Before and After Release

I’ve tried to analyze the defect data of Linux Kernel to find out the answers of the following questions.

1. How much effort the release candidate period should give?

2. Does the quality of stable kernel(stable kernel without update) becomes better or worse?

To figure out the questions, I gathered the following datum.

1. Pre-RC Changed Lines: The lines of source code changed between 2.6.(n-1) and 2.6.n-rc1. These source codes are confluxed during merge window period of each version of kernel. They are mostly implemented for new features of a new version of kernel, which needs a long period of testing and fixing after merge window closes.

2. Defects found during RC: Because mainline almost only accepts bug fix patches in RC period and we encourage one patch per bug, the patch set quantity of RC approximately represents quantity of defects found in RC period, which we call internal(inside development community) testing period. This data comes from git-log between 2.6.n-rc1 and 2.6.n.

3. Defects found after release: After stable kernel release, users and developers themselves will find some bugs and fix them. To eliminate the unfair caused by different release time, I only collect the stable kernel defects which was fixed before next 2.6.x stable release. For example, if 2.6.n was released on 2009.A.B and 2.6.(n+1) was released on 2009.C.D, the “Defects found after release” means the fixes for 2.6.n stable kernel before 2009.C.D. Actually, I dig the data from GregKH’s stable git trees.

By putting the RC defect ratio(”Defects found during RC”/”Pre-RC Changed Lines”[kstep]) to X-axis and Stable defect ratio(”Defects found after release”/”Pre-RC Changed Lines”[kstep]) to Y-axis, I got some interesting graphics here.

Wait… What do you expect to see? More RC fix bring higher quality of stable kernel?

1. Picture-1 maybe disappoints you. We see here: the more bugs got killed during RC the more complaint from stable users or developers.  Why? I don’t remember who said this, but I remember these words “Finding MORE defects means as MORE defects unrevealed”. RC defect rate doesn’t mean that the more bugs fixed the higher quality of release. Instead it tells us whether new features have a trustful quality before being sent into merge window. Developers should well test their patches of new feature as well as the patches for RC. So here comes another question: we can’t keep doing RC forever, thus when should we stop RC and goto release.

Linux Kernel defects before and after release

2. Picture-2 shows that RC fixing increased nearly as linear trend according to different Pre-RC code quantity. But trend is trend, some kernel releases didn’t have enough defects be found during RC and some had above-average RC defects ratio. Let’s go back to Picture-1, we can see densest RC defect ratio region is about 2.5-3.0 defect/kstep, and the average RC defect ratio is about 2.87 defect/kstep, which is marked by the red ordinate. Below the average RC defect ratio, four of ten releases have above-average stable defect ratio. If we set a rule to not allow ending RC when the RC defect ratio is still below average, we can make “the four” releases more stable.

Now we have seen two pictures of whole kernel defect ratio, but how about subsystems?

RC Fix according to merge window quantity

X-axis: Pre-RC Changed Lines, Y-axis: Defects found during RC

3. Not only whole kernel should have a RC defect ratio gate, each subsystem also should not allow too low RC defect ratio. Let’s see some.

Network subsystem is almost like whole kernel, stable defect ratio increases as RC defect ratio increasing. “core kernel”, “fs”, “arch” and “block” are almost as same, which I don’t paste here to save my host’s space ;)

network subsys

network subsys, X-axis: RC defect ratio, Y-axis: Stable defect ratio

Sound subsystem is some kind of strange.  When RC defect ratio exceeds 3.0 defect/kstep, stable defect ratio begins to descend. In my superficial opinion: very strong RC effort(seven times of average) causes higher stable quality. At this point, very high RC defect ratio no longer means “Finding MORE defects means as MORE defects unrevealed”, but means “Highly quality control reduces unrevealed defects”.

sound subsys

sound subsys, X-axis: RC defect ratio, Y-axis: Stable defect ratio

Memory Management looks like an ideal descending trend, if we ignore 2.6.16, 2.6.20 and 2.6.21. In mm, RC defect ratio is very higher than any other subsystems, because mm is essential part of core kernel and rarely changes.  I don’t recall what happened on the three releases, maybe something like virtualization supporting can cause such a defect boom.

mm subsys

mm subsys, X-axis: RC defect ratio, Y-axis: Stable defect ratio

4. This is the last graph which answers my question about “Does the quality of stable kernel becomes better or worse?”. From 2.6.14 to 2.6.30, stable kernel has an average defect ratio about 0.32 defect/kstep. Since 2.6.23, kernel tends to keep a lower stable defect ratio, although 2.6.27 and 2.6.28 departed from the trend. I’d like to say maybe we are releasing kernels of better quality.

Stabe Defect Ratio Trend

Stabe Defect Ratio Trend

As Linus said at Linuxconf this week :”Linux is bloated“,  it doesn’t only means that kernel got bigger and bigger, but also means that kernel grows up faster and faster. So when to release a new kernel not only depends on time and RC rounds, but also depends on defect ratio and new feature’s influence scope.

Linux-2.6.31 was released

Linus released Linux-2.6.31 on 2009-09-09 USA time.

What I am focusing on is the development status of kernel. And by the KPS tool,  I found some interesting points below.

1. The new joining engineers increased since 2.6.29 after a long time continuous decline since 2.6.25, when one and half years ago. 

Linux Kernel New Joiner

Linux Kernel New Joiner

2. Although more enginners joined kernel development, involved companies declined 2.6.29.

3. To look at Report, Review and Test, we can find out that more and more people are contributing to Linux by doing QA job. 

Linux Kernel QA

Linux Kernel QA

4. Not like USA or Europe, Chinese engineers joined kernel development these years, their contribution become more and more recently.
Chinese Patch contribution

Chinese Patch contribution