Wednesday, October 12, 2011

LINQ and Dictionaries

LINQ is a very powerful technique for operating on collections of objects in .NET. If you for example have a collection of integers, it is a simple task to pick the even numbers from the collection using LINQ:

int[] ints = new[] { 1, 3, 6, 8, 9, 10 };
IEnumerable<int> evens = ints.Where(i => i % 2 == 0);


Similarly for dictionaries:

Dictionary<int, int> intDict = new Dictionary<int, int> 
    { { 2, 3 }, { 3, 5 }, { 6, 7}};
Dictionary<int, int> evenKeys = intMap.Where(kv => kv.Key % 2 == 0);


What? Compilation error?!?

In fact, the intDict.Where() statement is not entirely correct. From the LINQ point of view, Dictionary<,> and other classes implementing the IDictionary<TKey, TValue> interface are actually regarded as implementations of the IEnumerable<KeyValuePair<TKey, TValue>> interface. Thus (ignoring for a moment that we could also have used the var keyword), the correct evenKeys assignment should read:

IEnumerable<KeyValuePair<int, int>> evenKeys = 
    intMap.Where(kv => kv.Key % 2 == 0);

Now, my guess is that in the normal case one would rather have the above assignment to return a Dictionary. Fortunately, LINQ also provides a number of ToDictionary method overloads. So for the evenKeys assignment to return a Dictionary we simply type:

Dictionary<int, int> evenKeys = intMap.Where(
    kv => kv.Key % 2 == 0).ToDictionary();

What?! Compilation error again?

Yes, because the ToDictionary method also operates on IEnumerable<T> objects. You need to tell the compiler how you want to design your Dictionary based on this arbitrary type T. For correctness, our evenKeys assignment has to be expressed as follows:

Dictionary<int, int> evenKeys = intMap.Where(kv => kv.Key % 2 == 0).
    ToDictionary(kv => kv.Key, kv => kv.Value);


For dictionaries, this explicitness may appear quite counter-intuitive, and there are several forum questions on the Internet indicating that this API design indeed has caused confusion (here, here and here for example).

After giving this issue some thought, I believe I have come up with an approach to bypass this hurdle of confusion. I have implemented the following extension method in a static utility class:

public static Dictionary<TKey, TValue> ToDictionary<TKey, TValue>(

    this IEnumerable<KeyValuePair<TKey, TValue>> source)

{

    return source.ToDictionary(kv => kv.Key, kv => kv.Value);

}

This overload of ToDictionary takes any object that implements the IEnumerable<KeyValuePair<TKey, TValue>> interface, which is a common return type when invoking LINQ operations on dictionaries, and returns a Dictionary<TKey, TValue> object using the same keys and values as the dictionary in the argument list. With this extension method defined, I now actually can enter:

Dictionary<int, int> evenKeys = intMap.Where(
    kv => kv.Key % 2 == 0).ToDictionary();

without getting the annoying compilation error.

Now this is all good and well, but I then decided to take the issue even one step further. Wouldn't it be good if for example the Where() extension method when operating on a Dictionary by default returned a Dictionary with the same keys and values?

Well, it can of course be done! And here is the solution:

public static Dictionary<TKey, TValue> Where<TKey, TValue>(

    this IDictionary<TKey, TValue> source, 

    Func<KeyValuePair<TKey, TValue>, bool> predicate)
{
    return Enumerable.Where(source, predicate).
        ToDictionary(kv => kv.Key, kv => kv.Value);

}

When this method is defined, it effectively hides the general Where(IEnumerable<T>, Func<>) extension method when the IEnumerable<T> object also implements the IDictionary<TKey, TValue> interface. Having the above Where() extension method defined now even makes it possible for us to apply our first dictionary LINQ attempt without compilation error:

Dictionary<int, int> evenKeys = intMap.Where(kv => kv.Key % 2 == 0);

Unfortunately, this method will always return a Dictionary object, regardless of whether intDict is a SortedDictionary or SortedList or another object implementing the IDictionary<TKey, TValue> interface. At this point, I have no viable solution for returning the same dictionary type that was used as input to the specialized Where method. The quest for a solution will continue nonetheless.

I have also collected a few more similar LINQ extension method overloads in a Github project that I have chosen to call ... dictionarylinq

In this project you may find dictionary overloads for the Except, Intersect and Union extension methods. Due to their signatures, for these methods it has actually been possible to implement a specialized overload that returns the true input dictionary type, together with a designated overload with better performance for the Dictionary<TKey, TValue> class.

Included are also method overloads for the Select method when the return type of the Func<TSource, TResult> object is a KeyValuePair<TKey, TValue>. It would be relatively easy to add more method overloads to dictionarylinq, but for now this is what there is.

When the extension methods from the dictionarylinq utility class are included in your application, these methods will effectively hide the general extension methods in the System.Linq.Enumerable class. If you do want to fall back on the general methods in a certain scenario, either make an explicit cast to the IEnumerable<> interface:

IEnumerable<KeyValuePair<int, int>> output = 

    ((IEnumerable<KeyValuePair<int, int>>)input).Where(kv => kv.Key > 3);

or invoke the static method explicitly:

IEnumerable<KeyValuePair<int, int>> output = 
    Enumerable.Where(input, kv => kv.Key > 3);

Please also note that the dictionarylinq class library in the Visual Studio solution is a Portable Class Library. This means that the library uses the "least common denominator" of .NET Framework 4, Silverlight and Windows Phone 7 base class libraries. The library can be built once and consumed by all three .NET technologies without rebuilding. If you have not installed the Portable Library Tools, simply create a new designated class library project and include the source code from the dictionarylinq class library, or include the source code directly in your own class library or application.

I hope that this effort can be useful also to others than myself. For questions and comments on this work, please do not hesitate to contact me for example via a blog comment. If you are a Github user, please feel free to report code issues via the project's Issues tab.

Good luck with Dictionary and LINQ!

No comments:

Post a Comment