KakimotOnline

August 11, 2009

Code performance

Filed under: performance, programming — Tags: , — nandokakimoto @ 2:32 am

Code performance is one of the most polemics topics in software development process, addressing a lot of issues, for example:

  • When to improve code performance?
  • Should I prioritize legibility or response time?
  • What part of my code should be improved?

There is a lot of discussion about code performance and one reason for that is due to fast code isn’t the best code. Code optimization generally affects code legibility, which affects code readability, which affects code maintenance, and so on. So, when you realize that is time to look a bit deeper to your code execution time, you must be aware of some trick aspects.

The first one is the Pareto Principle (also known as the 80-20 rule) which states that, for many events, roughly 80% of the effects come from 20% of the causes. This is exactly what happens in code optimization: 20% of the routines are responsible for 80% of program’s execution time. Sometimes you take the whole afternoon optimizing a loop which seems to be the cause of the slowness of your program. And after using all your optimization skills, the program looks like the old one. To avoid this kind of situations you should first measure your program’s bottleneck using a profiler tool: remember the 80-20 rule. After finding it and re-writing it using some basic optimizations techniques, you will notice that your code will be 80% faster.

Another tip is not relying on what everyone says about code optimization (me included). Have you ever heard about reducing lines of code, optimizing while coding, or this operation should be faster the other? Well, they are all suspicious.

Lot of programmers thinks that writing routines using just one or two lines of code makes a faster program. Take a look at the code below taken from Code Complete book.

For i = 1 to 10
a[i] = i
end

a[1] = 1; a[2] = 2; a[3] = 3;
a[4] = 4; a[5] = 5; a[6] = 6;
a[7] = 7; a[8] = 8; a[9] = 9;
a[10] = 10;

Both programs do the same thing. Which one do you think is faster? These programs were compared in the book and here is the result:

For loop
- Visual Basic: 8,47s
- Java: 12,6s

Linear code
- Visual Basic: 3,16s
- Java: 3,23s

In this case, the linear code was more than 60% faster than the “for loop”. Does it prove that as longer as your code, faster it will be? No! This proves that numbers of line and code performance does not have any predictable relationship.

Another very common mistake is optimizing your code while writing it. It is almost impossible to find bottlenecks without any specific tool. Without a profiler, you will probably lose time trying to optimize code that really doesn’t matter for your program performance. In addition, optimizing code while programming take you away from the main focus of it, for example user interaction and legibility. Premature code optimization is generally a bad choice.

So, when optimize code? First, concentrate efforts in creating modular, maintainable program, always aware of making things easy for future changes. When it’s done, measure its performance and optimize if necessary, but only if necessary. As small parts of code consumes a considerable percentage of the whole program’s execution time, measure your code before the modifications and just after them, so you can keep track if your changes are really effectives. Just to illustrate the difference between a readable code and an optimized code, see both examples below:

sum = 0;
 for (row = 0; row < rowCount; row++) {
    for ( column = 0; column < columnCount; column++) {
         sum = sum + matrix[row][column];
    }
 }
 sum = 0;
 elementPointer = matrix;
 lastElementPointer = matrix[rowCount-1][columnCount-1];
 while (elementPointer < lastElementPointer) {
    sum = sum + *(elementPointer++);
 }

Both loops calculate the sum of matrix’s elements. However, the second one uses just sum operations when calculating memory address dislocation, while the first loop uses multiplication, much more expensive (don’t rely on me, go ahead and prove it by yourself). Although slower, the first code is much easier to read and modify. So, which code to use? Remember, write readable code first, measure it and, if necessary, optimize it.

Code complete also comes with an entire chapter dedicated just to code optimization techniques. Next posts I’ll talk about some of them. Hope you like.

See you,


Fernando

July 22, 2009

Singleton (Anti-) Pattern

Filed under: programming, refactoring, software design, testing — Tags: , , — nandokakimoto @ 3:51 am

I’ve implemented my first design patterns at college, while creating a web system during the software engineer course. My classmates and I needed a facade class with a single instance of it throughout the system. So, because we’re really smart, we’ve applied the Facade and Singleton pattern. Actually, we’ve implemented the patterns without knowledge of the design patterns catalog. For us, it was just a way to get what we need.

This week, while doing my daily work, I realized how dangerous the Singleton pattern is. Well, I would have noticed it before, if I have used Test-First Programming. Even knowing the benefits of TDD, I decided to write some code in advance, since I don’t have much experience on the platform used and on its supported tests framework.

My task was to create a GPS abstraction and use it in a mobile Navigator module. My first thought was making my GPS abstraction a Singleton class. It looked very obvious for me: a mobile device has only one GPS and different GPS intances will provide the same data set. So, what I need is just a single GPS instance troughout the system.

After some minutes writing code, I’ve created abstractions similar to the Java sample below.

GPSProvider.java

import java.util.List;

public class GPSProvider implements LBSPositionObserver {

 private static GPSProvider INSTANCE = new GPSProvider();
 private Position lastCoordinate;
 private List<GPSListener> listeners;

 private GPSProvider() {
 this.listeners = new ArrayList<GPSListener>();
 }

 //retrive the class instance
 public static GPSProvider GetInstance() {
 return INSTANCE;
 }

 // perform GPS initialization
 public void initGPSService() {
 (...)
 }

 // add a new listener to the gps class
 public void attach(GPSListener listener) {
 this.listeners.add(listener);
 }

 // remove a listener from the class list
 public void dettach(GPSListener listener) {
 this.listeners.remove(listener);
 }

 @Override
 // set lastCoordinate and notify all listeners about the update
 public void positionUpdated(Position position) {
 this.lastCoordinate = position;
 for ( GPSListener listener : this.listeners ) {
 listener.positionUpdated(position);
 }
 }

 @Override
 // notify all listener about the update
 public void setStatus(int status) {
 for ( GPSListener listener : this.listeners ) {
 listener.statusUpdated(status);
 }
 }

 // return the last known coordinate
 public Position getLastCoordinate() {
 return this.lastCoordinate;
 }

}

GPSListener.java

public interface GPSListener {

 void positionUpdated(Position position);

 void statusUpdated(int status);

}

LBSPositionObserver.java – native GPS interface

public interface LBSPositionObserver {

 void positionUpdated(Position position);

 void setStatus(int status);

}

Navigator.java

public class Navigator implements GPSListener {

 private NavigatorObserver observer;

 public Navigator(NavigatorObserver observer) {
 this.observer = observer;
 }

 // start navigation by listening to gps updates
 public void navigate(Route route) {
 GPSProvider gps = GPSProvider.GetInstance();
 gps.attach(this);
 }

 // pause navigation by not receiving gps updates
 public void pause() {
 GPSProvider gps = GPSProvider.GetInstance();
 gps.dettach(this);
 }

 @Override
 // everytime the position is updated, the navigator gives directions if needed
 public void positionUpdated(Position position) {
 // navigate user through route.
 int step = this.verifyStepUpdated();
 if ( step != -1 ) {
 this.observer.StepUpdated(step);
 }

 if ( this.achievedDestination() ) {
 this.observer.DestinationAchived(position);
 }
 }

 // calculate if achieved destination
 private boolean achievedDestination() {
 (...)
 }

 // verify is have to change direction
 private int verifyStepUpdated() {
 (...)
 }

 @Override
 public void statusUpdated(int status) {
 // send status to end user.
 }
}

Now, since I’ve finished my Navigator module, I want to test it to make sure it’s working correctly. In order to test the Navigator module, I need a GPS data log and a route to walk through it, simulating a person walking and being navigated by the system. The route is not a problem because I pass it as a parameter of “navigate” method. However, there is no way to simulate a GPS log with the code above, unless I use some Dependency Injection Framework, which is not the case.

That’s the Singleton disadvantage. I can’t inject a GPS mock in my Navigator module because I always use the native gps implementation represented in my Singleton GPSProvider class. Everytime I need some GPS information, I use the GPSProvider.GetInstance() static method to retrieve the only GPS instance I have access to.

To solve this problem, I found a simple solution: not using Singleton. I removed  the Singleton pattern from GPSProvider and change every class that uses GPSProvider.GetInstance() to receive the current GPS intance. In the Navigator module, I passed the GPS instance through its class’ constructor.

GPSAbstractProvider.java

import java.util.ArrayList;
import java.util.List;

public abstract class GPSAbstractProvider {

 private List<GPSListener> listeners;

 public GPSAbstractProvider() {
 this.listeners = new ArrayList<GPSListener>();
 }

 public abstract void initGPSService();

 public void attach(GPSListener listener) {
 this.listeners.add(listener);
 }

 public void dettach(GPSListener listener) {
 this.listeners.remove(listener);
 }

 public void notifyPositionUpdated(Position position) {
 for ( GPSListener listener : this.listeners ) {
 listener.positionUpdated(position);
 }
 }

 public void notifyStatusUpdated(int status) {
 for ( GPSListener listener : this.listeners ) {
 listener.statusUpdated(status);
 }
 }

}

GPSProvider.java

public class GPSProvider extends GPSAbstractProvider implements LBSPositionObserver {
 private Position lastCoordinate;

 private GPSProvider() {
 super();
 }

 @Override
 // perform GPS initialization
 public void initGPSService() {
 (...)
 }

 @Override
 public void positionUpdated(Position position) {
 this.lastCoordinate = position;
 this.notifyPositionUpdated(position);
 }

 @Override
 public void setStatus(int status) {
 this.notifyStatusUpdated(status);
 }

 public Position getLastCoordinate() {
 return this.lastCoordinate;
 }

}

Navigator.java

public class Navigator implements GPSListener {

 private NavigatorObserver observer;
 private GPSAbstractProvider gps;

 public Navigator(NavigatorObserver observer, GPSAbstractProvider gps) {
 this.observer = observer;
 this.gps = gps;
 }

 public void navigate(Route route) {
 this.gps.attach(this);
 }

 public void pause() {
 this.gps.dettach(this);
 }

 @Override
 public void positionUpdated(Position position) {
 // navigate user through route.
 int step = this.verifyStepUpdated();
 if ( step != -1 ) {
 this.observer.StepUpdated(step);
 }

 if ( this.achievedDestination() ) {
 this.observer.DestinationAchived(position);
 }
 }

 private boolean achievedDestination() {
 (...)
 }

 private int verifyStepUpdated() {
 (...)
 }

 @Override
 public void statusUpdated(int status) {
 // send status to end user.
 }
}

Much better! Notice that I still have just one single instance of my GPS class. All I have to do is creating my GPS at the beginning of the program and pass it through classes that use it.
Doing this, testing my code became very simple. I inject my GPS mock to simulate GPS data in my tests,  just as below.

NavigatorTest.java

import java.util.ArrayList;
import java.util.List;

import junit.framework.TestCase;

public class NavigatorTest extends TestCase implements NavigatorObserver {

 private Navigator navigator;
 private GPSAbstractProvider gps;

 public void testNavigate() throws Exception {
 List<Position> gpsPositions = new ArrayList<Position>();
 this.addPositions(gpsPositions);
 this.gps = new GPSTestProvider(gpsPositions);

 this.navigator = new Navigator(this, this.gps);
 this.navigator.navigate(this.createSimpleRoute());

 gps.initGPSService();

 assert(...);
 assert(...);
 }

 private Route createSimpleRoute() {
 (...)
 }

 private void addPositions(List<Position> gpsPositions) {
 (...)
 }

 @Override
 public void DestinationAchived(Position destination) {
 (...)
 }

 @Override
 public void StepUpdated(int step) {
 (...)
 }

}

As you can see, I can use any GPS provider in my tests, making them much more easy and flexible.
In addition to the disavantage presented, Alex Miller discuss some points why he hates Singleton pattern.
From here, trying avoiding this pattern in situations like that. Think twice before applying Singleton in your project.

See you


Fernando

July 15, 2009

C Arrays and Pointers

Filed under: c/c++, programming — Tags: , , — nandokakimoto @ 2:15 am

Hello guys,

some weeks ago I was reading some questions in StackOverflow and one of them really took my attention. I don’t have a great experience with C programming language, but I’ve started developing in C++ and Symbian 3 moths ago, which improved my skills in pointers and memory allocation. Even after some moths programming in C++, I could not understand why in C arrays a[5] == 5[a], can you?

To answer this question, we must understand the behavior of C arrays. What happens in memory when writing:

int array[] = {0, 1, 2, 3, 4, 5};

The first thing to have in mind is that this assignment creates 5 memory “slots” with 32bits each one. A single “slot” has its value and its address, just like a pointer. It’s easy to predict the value of each position value:

array[0] == 0;
array[1] == 1;
(…)

Talking about the variable address, imagine the array is located at address 0×0000. As you already know, in a 32bit platform, an integer has 32bits = 4bytes. So we can conclude that to access any array position we can use the following calculation:

address = start_address + (index * size)

&array[0] == (0 + (0 * 4));
&array[1] == (0 + (1 * 4));
(…)

And it’s exactly what the compiler does. So, when you write array[5] it’s the same of *(0 + (5 * 4)) = *(20) = 5. Easy! However, what about 5[a]? Well, the calculation is similar. Remember that C arrays are nothing just pointers, which means that both declarations below are equivalents:

int array[5];
int *array;

And when incrementing an array, you are just going to the array’s next position. Which means:

array[1] == *(array + 1);

Now, if we look back to 5[a], what does it looks like? Similar to what I said, 5[a] = *(5 + a) = *(a + 5) = a[5].

Now, things are making sense.
To read more about it, go to the original question here.

See you,
Fernando

June 27, 2009

Basic Data Types

Filed under: programming — Tags: , , — nandokakimoto @ 2:46 pm

I’ve been reading the single most influential book every programmer should read since last month and there is a really nice chapter about basic data types. Actually, when I read the chapter’s title, I thought reading it would be a waste of time, but I was wrong. There are some quick tips that really worth its reading and is about them that we will be talking today.

Magic Numbers

The first tip is avoiding magic numbers, which are literal numbers spread all over the code with no explanation. For example:

for ( int i = 0; i < 65; i++) { ... }

Does anyone here know what 65 means? So, if your language supports named constants, use them in place of the number itself. If named constants are not supported, you can use global variables too.

Avoiding magic numbers provide tree advantages:

  • Changes can be made in a more confident way.
  • Changes can be made in a more easy way.
  • You improve your code legibility.

The code above can be improved by using a macro, as shown below:

#define MAX_ENTRIES 65
for ( int i = 0; i < MAX_ENTRIES-1; i++) { ... }

Remember: the only literal values that should occur in your program are 0 and 1. The others should be placed with a more descriptive name.

Integer Operations

Be careful when dividing two integers. Keep in mind that 7/10 it’s not 0,7. The result is, in most cases, 0 (zero), varying from one language to other. In real world, we’ve learned that 10*(7/10) = (10*7)/10 = 7, which is not that same in integer’s arithmetic: 10*(7/10) = 0 because 7/10 = 0.

Other problem is integer overflows. When operating integers, such as additions and multiplications, you must have in mind the biggest possible value. For example, if you multiply 250*300 the result is 75.000, but if the maximum possible value is 65.535, the result is 9.464 due to a integer overflow (75.000 – 65.536 = 9.464).

Float Operations

The main consideration about float numbers is that most decimal fractions can’t be represented with precision, with basically 7 or 15 digits. Because of that, avoid operating float numbers with different sizes. For example, what’s the result of 1.000.000,00 + 0,1? Probably, this operation will return 1.000.000,00 because 32 bits don’t offer significant digits between the interval 1.000.000 and 0,1. The same happens here: 5.000.000,02 – 5.000.000,01 probably will return 0,0.

A classic problem with float numbers is the incremental of 0,1 in a for loop.

double nominal = 1.0;
double sum = 0.0;

for ( int i = 0; i < 10; i++ ) {
    sum += 0.1;
}

if ( nominal == sum ) {
    System.out.println("Same");
} else {
    System.out.println("Different");
}

As you already know, the result is “Different”. After the loop, sum = 0,999999999999999 and the comparison fails. The work around here is using an acceptable precision interval and a boolean function to determine the equality of both numbers.

final double ACCEPTABLE_DELTA = 0.00001;

boolean Equals( double n1, double, n2)  {
    if ( Math.abs ( n1 - n2 ) < ACCEPTABLE_DELTA ) {
        return true;
    }
    return false;
}

If you are working with currency values, this solution is not recommended. An alternative is passing float numbers to 64bits integer, by multiplying them for 100. After the operations has completed, pass the values again to float, by dividing them for 100.

Boolean Variables

Boolean variables are useful to document your program. Instead of testing a boolean expression, you can assign it to a variable, making your code easy to read. For example:

if ( (elementIndex < 0)
      || (MAX_ELEMENTS < elementIndex)
      || (elementIndex == lastElementIndex) ) { … }

This piece of code can be improved by creating local variables, as below:

finished = (elementIndex < 0) || (MAX_ELEMENTS < elementIndex);
repeatedEntry = (elementIndex == lastElementIndex);

if ( finished || repeatedEntry) { … }

Some languages have a predefined boolean type, such as Java and C++, others don’t. But it’s possible to define your own Boolean type. In C, for example, you may use the code below:


typedef int BOOLEAN
enum Boolean {
    True = 1, False = (!True)
};

Conclusion

We have covered Integers, Float and boolean variables. In addition, the book chapter also mentions characters and strings, enum types and arrays, but these are topics for a future post.

See you,


Fernando

May 21, 2009

Breaking Information Hiding in C++

Filed under: c/c++, programming — Tags: , — nandokakimoto @ 4:10 am

Almost every object-oriented programmer is familiar with the concept of information hiding. Actually, lots of them get confused with the terms encapsulation and information hiding. So, before we talk about information hiding itself, let me explain the difference of both terms:

  • Encapsulation is the public interface that defines how an object can be used, and how its data is derived.
  • Information Hiding is the principle of hiding design decisions, preventing external objects from using the derived data.

Now let’s take a look in a real sample. The code below shows a classic header file, corresponding to a Point class. Note that class’ members are protected against modification, since they are declared as private and no setter methods were provided. The class data is also encapsulated into the Point interface, which defines how Point objects are used, through its constructor and getter methods.

Point.h

pontoh

A very common issue when using a third-party API is the need of changing private members value when running some tests. However, if no setter methods exist, the change can’t be made, right? Well, in C++ things are pretty different. When using pointers, you’re working directly with memory addresses. So if we create pointers that make reference to private member’s address, we can manipulate its value. Look at the code below.

main.cpp

main

Note that I’m using an integer pointer, which is an array of integers. If we come back to Point class definition, we will see just two integers. Therefore, in respect with memory manipulation Point class is similar to an integer array. In other words, array[0] is the same of point->x since both read int values (4 bytes in win32 platform). When doing the pointer attribution and the explicit cast, we’re just pointing the integer array to the first memory address of the object. Changing its value will affect the memory content, it means: the object private member.

I know that this is a little bit weird, but it works perfectly. To looks like more acceptable, we could create a new class, similar to Point class signature, and manipulate its members, but in fact working with the Point class member variables. Here is the code that illustrates this behavior.

MyPoint.h

mypointh

main.cpp

mainmypoint

Now, be free to change private members wherever you want.

See you,

[]’s

Fernando

May 15, 2009

Principles of Package Design

Filed under: programming, software design — Tags: , , — nandokakimoto @ 2:39 pm

Agile Software Development is really an incredible book. After reading it, a lot had been talked about the Object-Oriented Principles on this blog (here and here). Today, I’ll be talking about another section of the book, which introduces principles used to maintain high package cohesion and a desirable package dependency, known as Principles of Package Design.

Packages are used to organize larger projects. More than this, they are containers of classes used to manage the development and distribution on software. Package’s goal is separate classes in an application according to some criteria. However, classes often have dependencies on other classes, creating package dependencies relationships. To help manage this situation, some principles were created to govern the creation, interrelationship and use of packages.

The three principles in sequence are related to package cohesion, useful in deciding how to partition classes into packages.

The Reuse-Release Equivalence Principle (REP): the granule of reuse if the granule of release

REP states that anything that we reuse must also be released and tracked. Reusability is not only the creation. It comes only after there is a track system in place that offers the guarantees of support that reusers will need. Since reusability is based on packages, if some software is going to be reused, then it must be partitioned in a convenient structure for this purpose, so all classes in a package become reusable by the same audience.

The Common-Reuse Principle (CRP): the classes in a package are reused together

CRP helps in deciding which classes should be placed into a package. This can be determined by reuse characteristics. Classes that tend to be reused together should be placed in the same package. Remember: if you reuse one class in a package, you reuse them all. Although the CRP tells us what classes put together, it also says what classes do not put in the same package. If a class depends on another class in a different package, it depends on the entire package. Every time the used package is released, the using package must be revalidated and rereleased, even the change was made in a different class that the using package depends on. Therefore, CRP also says that classes which are not tightly related to each other should not be in the same package, so every time I depend on a package, I depend on every class in that package.

The Common-Closed Principle (CCP): a change that affects a package affects all the classes in that package and no other packages.

This is the same as SRP, but applied for the packages context. Such as classes should have just one reason to change, CCP states that packages should not have multiple reasons to change. It’s preferable that changes occur in just one package rather than distributed along the whole system.

Now, we are going to take a look at principles related to package relationship and dependency.

The Acyclic-Dependencies Principle (ADP): allow no cycles in the package-dependency graph.

Package cycles create immediate problems, and the most obvious one is when you have to release a package that was modified. In order to release the package, it must be compatible with all other packages that it depends on. A cycle in your package-dependency graph makes release harder since you increase the number of packages that you have to be compatible with. For example, take the Figure 1 and Figure 2. To release package C in the first situation we must be compatible with package E and F. However, in the second case we also must be compatible with package A, B and D. Even worst, if you want to test package C, we must link and build all other packages in the system, instead of just two of them.

no-cycle

To break the cycle, two solutions are suggested:

  1. apply Dependecy-Inversion Principle.
  2. create a new package between C and A, and move the classes that they both depend on into that new package.

The Stable-Dependencies Principle (SDP): depend in the direction of stability.

Stability has nothing directly to do with frequency of change, but to the amount of work required to make the change. In software, package stability is measured in number of classes inside this package that depend on classes outside this package and number of classes outside this package that depend on classes within this package. Thus, a package is called instable if it is easy to change, in other words, just a few or none packages have dependency relationship with it. A package is called stable if it is hard to change, or a lot of classes outside this package depend on it. The book also brings a method to calculate stability metrics, but I’ll not be such detailed. Just have in mind that a good design contains some instable package and some stable package. Instable packages are on top and depend on stable packages at the bottom, just shown below.

figura1

The Stable-Abstractions Principle (SAP): a package should be as abstract as it is stable.

This principle states that a stable package should also be abstract so that its stability does not prevent it from being extended, and an instable package should be concrete since its instability allows the concrete code within it to be easily changed. Thus, stable packages consist of abstract classes and instable packages of implementations classes.

Conclusion

After some posts of OOP, we already know how to create a good class design. However, in big systems, classes are separate into packages which corresponds a very important part of the system’s release and deploy. So, a new concern is how to divide classes between packages and maintain the system’s consistency. In this post we introduced six principles explained in Robert Martin book which help us in solve a lot of package design problems. You can find a lot of more information in the book, which I strictly recommend.

See you,


Fernando

Older Posts »

Blog at WordPress.com.