Welcome!

Apache Authors: Carmen Gonzalez, Liz McMillan, Elizabeth White, Pat Romanski, Christopher Harrold

Related Topics: Apache

Apache: Blog Post

Apache Hadoop: Technical Debt Decreased by 14% Through Code Refactoring

Initial Technical Debt of the project reduced from 136 to 117 days of remediation

Technical Debt is worth nothing if no pragmatic action is taken into code, in order to control and tackle it. To illustrate Scertify's capability to automatically correct code defects that increase this unintended debt, we performed code refactoring on two subprojects of the Hadoop project : Hadoop Common and Hadoop Mapreduce. Thanks to Scertify, we were able to correct 25K defects in 2 minutes. In other words, 14% of the Technical Debt has been written-off without any human effort needed.

Initial analysis
According to Wikipedia, Apache Hadoop is "an open-source software framework that supports data-intensive distributed applications". This framework contains several projects, Common and Mapreduce are two important ones with respectively 120K and 162K lines of code (blank lines and comments excluded). The version we worked with is the last development version : 3.0.0-SNAPSHOT. We ran Scertify Refactoring Assessment, our open-source plugin for Sonar, on the projects, in order to get an overview of their technical debt. Technical debt is defined as the amount of time needed to correct all defects detected. As you can see on screen-shots below, Common has a technical debt of 70 days and Mapreduce of 66 days. Scertify Refactoring Assessment also computes the potential of automatic correction of the technical debt : the debt write-off. They both have a good potential for automatic refactoring, respectively 38 and 36 days. So, the next step is to use Scertify to perform this automatic refactoring. By the way, if you would like to try it with your own source code, a trial version of Scertify is available here.

Hadoop Common Original Techdebt

Hadoop Mapreduce Original Technical debt

We scrolled among the various errors and we chose 8 rules to perform the demonstration.

Refactoring rules for the demonstration

Here's a presentation of the refactoring rules we used in this demonstration. As you can see, some rules need parameters to be efficient. This is the case of rules regarding logging. The logging framework used in those project is Apache Common logging, so we configured the rules to use this framework.

AvoidPrintStackTrace

This rule reports a violation when it finds a code that catch an expression and print its stack trace to the standard error output. A logging framework should be used instead, in order to improve application's maintainability. The refactoring replace a call to print stack trace by a call to a logging framework. The rule can also declare the logger in the class and make the required imports. Here's an example of the original code and the refactored code in the class GenericWritable.

Original code:

catch (Exception e) {
      e.printStackTrace();
      throw new IOException("Cannot initialize the class: " + clazz);
}

Refactored code:

catch (final Exception e) {
      LOG.error(e.getMessage(), e);
      throw new IOException("Cannot initialize the class: " + clazz);
}

In this case, LOG was not declared so it was added to the class and import were made :

private static final Log LOG = LogFactory.getLog(GenericWritable.class);

InefficientConstructorCall
Calling the constructor of a wrapper type, like Integer, to convert a primitive type is a bad practice. It is less efficient than calling the static method valueOf.

PositionLiteralsFirstInComparisonsRefactor

This rule checks that literals are in the first position in comparisons. The refactoring invert the literal and the variable. This ensures that the code cannot crash due to the variable being a null pointer.

AddEmptyStringToConvert

Using the concatenation of an empty string to convert a primitive type to a String is a bad practice. First of all, it makes the code less readable. It is also less efficient in most cases (the only case where the string concatenation is slightly better is when the primitive is final). Here's an example taken from class MD5MD5CRC32FileChecksum.

Original code:

xml.attribute("bytesPerCRC", "" + that.bytesPerCRC);

Refactored code:

xml.attribute("bytesPerCRC", String.valueOf(that.bytesPerCRC));

GuardDebugLogging
When a concatenation of String is performed inside a debug log, one should check if debug is enabled before making the call. Otherwise, the String concatenation will always be done. The refactoring adds a guard before the call to debug. In this case, it is configured to use the method isDebugEnabled(), since we use Apache's log. Below is an example of refactored code taken from class ActiveStandByElector:

if(LOG.isDebugEnabled()){
        LOG.debug("StatNode result: " + rc + " for path: " + path + " connectionState: " + zkConnectionState + " for " + this);
}

IfElseStmtsMustUseBraces

This rule finds if statements that don't use braces. The refactoring adds required braces.

UseCollectionIsEmpty

This rule finds usage of Collection's size method to check if a collection is empty. Rather than using size(), it is better to use isEmpty() making the code easier to read. The refactoring replace comparisons between size and 0 with a call to isEmpty().

LocalVariableCouldBeFinal

This method flags local variables that could be declared final and are not. The use of the final keyword is a useful information for future code readers. The refactoring adds the "final" keword. This is not a critical rule, but since it has a huge number of violations, it is useful to get rid of them quickly with automatic refactoring.

Scertify's refactoring results
So we ran Scertify on both projects to detect and refactor those rules. On each project it took around 1 minute to perform the full process. Scertify generates an html report with information on errors detected and corrected. Below is a summary of all errors corrected in the two projects. Many minor things were corrected, but also more important ones. Overall, it took 2 minutes to correct 25392 defects. Not so bad isn't it? Those defects include both minor violations and more critical violations in term of maintainability, performance or robustness.

Violations refactored

As you can see on screen-shots below, with those defects corrected the technical debt of each project has been reduced of 10 days. Overall, that's 20 day of technical debt that have been written-off.

Refactored Common technical debt

Refactored Mapreduce technical debt

Last but not least, Hadoop contains many unit tests and of course we made sure that they still succeed after the refactoring. To conclude, thanks to Scertify's refactoring features we were able to efficiently correct 25K defects in few minutes. We are glad to make the refactored code available to community, you can download it below. We will continue to do such refactoring on open-source applications, so if you have an idea for an open-source project that could leverage such refactoring, just let us know!

Download the source files

More Stories By Michael Muller

Michael Muller, a Marketing Manager at Tocea, has 10+ years of experience as a Marketing and Communication Manager. He specializes in technology and innovative companies. He is executive editor at http://dsisionnel.com, a French IT magazine and the creator of http://d8p.it, a cool URL shortener. Dad of two kids.

IoT & Smart Cities Stories
Moroccanoil®, the global leader in oil-infused beauty, is thrilled to announce the NEW Moroccanoil Color Depositing Masks, a collection of dual-benefit hair masks that deposit pure pigments while providing the treatment benefits of a deep conditioning mask. The collection consists of seven curated shades for commitment-free, beautifully-colored hair that looks and feels healthy.
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
The textured-hair category is inarguably the hottest in the haircare space today. This has been driven by the proliferation of founder brands started by curly and coily consumers and savvy consumers who increasingly want products specifically for their texture type. This trend is underscored by the latest insights from NaturallyCurly's 2018 TextureTrends report, released today. According to the 2018 TextureTrends Report, more than 80 percent of women with curly and coily hair say they purcha...
We all love the many benefits of natural plant oils, used as a deap treatment before shampooing, at home or at the beach, but is there an all-in-one solution for everyday intensive nutrition and modern styling?I am passionate about the benefits of natural extracts with tried-and-tested results, which I have used to develop my own brand (lemon for its acid ph, wheat germ for its fortifying action…). I wanted a product which combined caring and styling effects, and which could be used after shampo...
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.