Time Efficiency of IF statements vs. Python Dictionaries

Oftentimes when I’m straight grinding on some mad code, I find myself in a situation where I need to take a certain action based on a condition, i.e.:

def decision_func(input_var):
    if input_var == '1':
        do_something()
    elif input_var == '2':
        do_something_else()
    elif input_var == '3':
        do_another_thing()

The time complexity of the above logic is O(n) where n is the number of possible conditions.  The program has to scan the conditions until the proper one is met.  The alternative that I’ve always thought is more efficient is to write something logically equivalent:

 

DECISION_MAP = {
    '1': do_something,
    '2': do_something_else,
    '3': do_another_thing     
}
def decision_func(input_var):
    func = DECISION_MAP[input_var]
    func()

If we define a static dictionary outside of the function, we now have a time complexity of O(1) in making a decision.  One day I was at a friend’s place, who we’ll just call “Rainbow Bill,” who contended that the approach I preferred by using dictionaries was actually less efficient in general because of the time it took to generate the hash of the input variable.  After that it got pretty heated and I was asked to leave.

The only way to settle the dispute then was to actually set up a test with both methods and see which one was faster, and so I set up a test.  So, I started off with the most extreme and obvious case where my preferred method would be faster:

def func_with_tons_of_ifs(input_var):
    if input_var == '0':
        return True
    elif input_var == '1':
        return True
    elif input_var == '2':
        return True

That’s just a snippet of the function…it goes on for several hundred more lines:

    elif input_var == '634':
        return True
    elif input_var == '635':
        return True
    elif input_var == '636':
        return True
    elif input_var == '637':
        return True
    elif input_var == '638':
        return True
    elif input_var == '639':
        return True
    elif input_var == '640':

Contrasted to:

RETURN_DICT = {"%s" % number: True for number in xrange(669 + 1)}

def func_with_dict(input_var):
    return RETURN_DICT[input_var]

Then I just created my own function decorator to time the results:

import datetime

def timed(func):
    num_trials = 1000
    def inner(*args, **kwargs):
        sum_seconds = 0.0
        for i in xrange(num_trials):
            start = datetime.datetime.utcnow()
            result = func(*args, **kwargs)
            finish = datetime.datetime.utcnow()
            delta = (finish - start).total_seconds()
            sum_seconds += delta
        avg_seconds = sum_seconds / num_trials
        print "Time for %s after %s runs: %s" % (func.__name__, num_trials, delta)
        return result
    return inner

Now when I compared the results for:

func_with_tons_of_ifs("669")
func_with_dict("669")

The output is:

Time for func_with_tons_of_ifs after 1000 runs: 1.8e-05
Time for func_with_dict after 1000 runs: 1e-06

In the worst case scenario I created for the IF statement logic, it’s 18 times slower than the method using a dictionary.  We can do that again with more complex strings:

    elif input_var == 'crazysilltfreshreallylongstrongthattakesalongertimetocompare6':
        return True
    elif input_var == 'crazysilltfreshreallylongstrongthattakesalongertimetocompare7':
        return True
    elif input_var == 'crazysilltfreshreallylongstrongthattakesalongertimetocompare8':
        return True
    elif input_var == 'crazysilltfreshreallylongstrongthattakesalongertimetocompare9':
        return True
    elif input_var == 'crazysilltfreshreallylongstrongthattakesalongertimetocompare10':
        return True
    elif input_var == 'crazysilltfreshreallylongstrongthattakesalongertimetocompare11':
        return True

The point here is that I’m now making the values to compare more complex.  The time results are now worse for the IF logic.

Time for func_with_tons_of_ifs after 1000 runs: 2.6e-05
Time for func_with_dict after 1000 runs: 1e-06

Clearly then, if we assume the worst possible case, then using the dictionary based logic is less expensive in terms of time.  But what about stacking the deck in favor of the IF logic?

@timed
def func_with_tons_of_ifs(input_var):
    if input_var == '0':
        return True
    elif input_var == '1':
        return True
    elif input_var == '2':
        return True
    elif input_var == '3':
        return True
    elif input_var == '4':
        return True
    elif input_var == '5':
        return True

RETURN_DICT = {"%s" % number: True for number in xrange(5 + 1)}
@timed
def func_with_dict(input_var):
    return RETURN_DICT[input_var]

func_with_tons_of_ifs("5")
func_with_dict("5")

Output:

Time for func_with_tons_of_ifs after 1000 runs: 1e-06
Time for func_with_dict after 1000 runs: 1e-06

Now we have a fairly even skew.  Now stack the deck with IF logic’s favor just a bit more by hitting the first conditional statement:

func_with_tons_of_ifs("0")
func_with_dict("0")

Output:

Time for func_with_tons_of_ifs after 1000 runs: 1e-06
Time for func_with_dict after 1000 runs: 1e-06

The results are negligible.

The honest takeaway from the whole experiment is that it really hardly matters.  In general the number of conditional statements will not reach ridiculous magnitudes, but using dictionaries in an extremely time-sensitive application will create a performance boost.  The time it takes to hash an integer or a string was hardly adverse.

Also note that all of the tests run in this case assumed that the dictionary had already been defined.  If the function in question were called zero or one times, using the IF logic would be more efficient in that case.

  • gest

    rub it in your friend’s face that you are right