Simple color-based search by image in F#

Last week I was playing with a photomosaic composer toy project and needed a simple search by image engine. By search by image I mean searching a database for an image similar to a given one. In this tutorial I will show you how you can implement this functionality –with some obvious limitations– in an extremely simple way just by looking at an image’s color distribution.

If you are looking for feature similarity (shapes, patterns, etc.) you most likely need edge detection algorithms (linear filters or other similar methods), which give excellent results but are usually quite complicated. I suppose that’s the way most image search engines work. Alternatively this paper describes the sophisticated color-based approach used by Google’s skin-detection engine.

In many cases however, finding images with a perceptually similar color distribution can be enough.
If you are in this situation, you may get away with a very simple technique that still gives pretty good results with a minimal implementation effort. The technique is long known and widely used, but if you have no experience in image processing this step-by-step guide may be a fun and painless warm-up to the topic.

I’ll show the concept with the help of F# code, but the approach is so straightforward that you should understand it even without prior knowledge of the language.


This is the high level outline of the process.

Just once, to build a database “index”:

  • Create a normalized 8-bit color distribution histogram of each image in the database.

For every query:

  • Create a normalized 8-bit color distribution histogram of the query image.
  • Search the database for the histogram closest to the query using some probability distribution distance function.

If you are still interested in the details of each step, please read on.

Extracting an image’s color signature

Given that we want to compare images, we’ll have to transform them into something that can be easily compared. We could just compute the average color of all pixels in an image, but this is not very useful in practice. Instead, we will use a color histogram, i.e. we will count the number of pixels of each possible color.

A color histogram is created in four steps:

  1. Load/decode the image into an array of pixels.
  2. Downsample the pixels to 8-bit “truecolor” in order to reduce the color space to 256 distinct colors.
  3. Count the number of pixels for each given color.
  4. Normalize the histogram (to allow the comparison of images with different size).

1. Loading the image

This is almost trivial in most languages/framework. Here’s the F# code using the System.Windows.Media APIs:


2. Downsampling 32-bit color to 8-bit



With the help of some basic bitwise operations we reduce pixels from 32 bits down to 8. We discard the alpha channel and keep 2 bits for blue (out of the original 8), 3 for red and 3 for green (we discard the least significant bits of each color component). The result is that each pixel (being a byte) can represent one of exactly 256 colors. We obviously loose some color detail because we cannot represent all the original gradients, but having a smaller color space keeps the histogram size manageable.

Note: in general 8-bit images use a palette, i.e. every pixel value is a pointer to a color in a 256-color palette. That way the palette can be optimized to only include the most frequent color in the image. In our case the benefit would not be worth the trouble as we would need a common palette across all the images anyways (plus the above method is faster and simpler).

3. Creating the histogram



Nothing special here: we just count the number of pixels that are of a given color. The histogram is nothing more than a 256-elements array of integers (plus the image file name). You can read it like “this image has 23 “light green” pixels, 10 “dark red” pixels, etc.”
We then normalize the histogram by dividing each value by the total number of pixels so that each color amount is a float value in the 0 .. 1 range, where for ex. 0.3 means that a picture has 30% of pixels of that given color.


Comparing color histograms

Now we have a collection of histograms (the database) and a query histogram. In order to find the best matching image, we need a way to measure how similar two histograms are. In other words we need a distance function that quantifies the similarity between two histograms (and thus between two images).

You probably have noticed that a normalized histogram is in fact a discrete probability distribution. Every value is between 0 and 1 and the sum of all values is 1. This means we can use statistical “goodness of fit” tests to measure the distance between two histograms. For example the chi-squared test is one of those. We are going to use a slight variation of it, called quadratic-form distance. It is pretty effective in our case because it reduces the importance of differences between large peaks.
The test is defined as follows (p and q are the two histograms we are comparing):

[latex]distQF(p, q) = \frac{1} {2}  \sum_{i=0}^n \frac{(p_i – q_i)^2} {p_i + q_i}[/latex]


the implementation is straightforward:

The more two histograms are different, the larger is the return value of this test. The test returns 0 for two identical histograms.

A more sophisticated option is the Jensen-Shannon divergence, that is a  a smoothed version of the Kullback-Leibler divergence. While being more complicated, it has the interesting property that its square root is a metric, i.e. it defines a metric space (in layman’s terms, a space where the distance between two points can be measured, and where the distance A → B → C cannot be shorter than the direct distance A → B). This property is going to be useful in the next post when we’ll optimize our search algorithm.

The Kullback-Leibler and Jensen-Shannon divergences are defined as:

[latex]distKL(p, q) = \sum_{i=0}^n p_i ln \frac {p_i} {q_i}[/latex]


[latex]distJS(p, q) = \frac{1}{2} distKL(p, \frac{1}{2}(p + q)) + \frac{1}{2} distKL(q, \frac{1}{2}(p + q))[/latex]


This is the corresponding F# code:

This paper includes an interesting comparison of various distance functions.

At this point our problem is almost solved. All we have to do is iterating through all the samples measuring the distance between query and sample and selecting the histogram with the smallest distance:

Notice that I use head because I’m only interested in the best matching item. I could truncate the list at any given length to obtain n matching items in order of relevance.

Optimizing the search

Maybe you’ve noticed one detail: for every query, we need to walk the full database computing the distance function between our query histogram and each image. Not very smart. If  the database contains billions of images that’s not going to be a very fast search. Also if we perform a large number of queries in a short time we are going to be in trouble.

If you expected a quick and easy answer to this issue I’m going to disappoint you. However, the good news is that this problem is very interesting, much more so than it may look at first sight. This will be the topic of the next post, where I’ll write about VP-Trees, BK-Trees, and Locality-Sensitive Hashing.

Grab the source

The complete F# source of this tutorial is available on GitHub (a whopping 132-lines of code).

Thanks to my brother Lorenzo for the review and feedback.

Real world F#: my experience (part two)

The second project I recently completed in F# is a completely different animal. While the first one is a pet project I’ve put together in my spare time (with no deadline at all), this one has been a full-time work for my company (for this reason I cannot disclose some details or share source code). Additionally, time available was limited. Very limited. Like 2 weeks limited. That’s 10 working days plus a 2-days emergency buffer.

A load simulator tool

My company produces a high performance client-server platform that ships with our own proprietary database engine. After some important changes to the server and database codebase, we needed to test the system’s behavior under heavy load, i.e. when a large number of users are connected and firing queries.

As you can guess, hiring and coordinating hundreds of people to load the system the way you need is very impractical, if possible at all. Maybe it’s doable if you have your own Army of Clones, but we don’t have one, so we had to somehow automate the process. Keep in mind that the server interface is proprietary, i.e. it’s not http, SQL, or anything similar: we have to go through our library and API to access the server. For that reason we could not use any existing tool.

The application I was going to build was meant for internal use but it was clear that something usable by non-über-geeks would have been nice to have at some point (for example to help sizing hardware for large customers). Anyways, being the deadline very close, it was imperative (no pun intended) to focus on the most important stuff.

Writing your own DSL

I decided to define an external DSL to describe the simulation scenarios. The language would let you express the creation of users, connections, queries, pauses, etc. in a simple way.
The second decision was to use F#. Fortunately nobody objected (again no pun intended). I was to work on the project alone, so I could basically use whatever I liked.

Once I defined the grammar I went to step 2, i.e. parsing. Obviously I was not going to reinvent the wheel by rolling out my own lexer and parser so the choice was between parser generators (FsLex/FsYacc, Irony for C# & co.) and combinator libraries à la FParsec. After taking some advice from the great F# community on Twitter (thanks Robert!), I opted for FParsec. I admit it looked a bit intimidating, but the idea of not introducing a tooling step in the build process was appealing, plus I had never used a combinator library before and was curious.

Here starts the amazement. As mentioned, at first FParsec looks slightly cryptic, but once you get the main concepts and get over a few gotchas it just “clicks”. You quickly reach a point where reading the parser code is almost like reading the grammar definition. Making changes is a matter of a few minutes with a very low risk of introducing new errors. FParsec gives you an enormous flexibility and even if the learning curve is steeper than learning parser generators I suggest you look at it if you’ve never done it before. The official documentation is great too.

Anyways, in a few days I had a parser that lifted the input program to the abstract syntax tree. Sweet!

Note: in case you are wondering, the language I defined was not super complicated but also not trivial. It supports regular loops as well as parallel ones (iterations are executed in parallel), nested loops and a plethora of options on all the various commands. I opted for a rich syntax that results in programs that are almost written in natural language. I cannot disclose all the details, but you can get an idea by looking at the screenshots.


Walking the tree

Second amazement: thanks to discriminated unions and pattern matching, walking the syntax tree is an incredibly fluid and easy process. The code is so compact and elegant that I keep opening that file just to look at it. No boilerplate, no class proliferation, no wasted characters. Just the code.

Unfortunately I could not leverage the powerful F# concurrency features to run parallel loops because the client library that interfaces with our server is not thread-safe, so all I could do was starting new threads with each its own separate AppDomain. My skills on asynchronous workflows & co. are still limited so I don’t know if there’s a better way. If that’s the case, I’d love to hear your feedback in the comments.

GUI and extras

With parsing and interpreting done, the bulk of the job was over. I just needed to add logging and a less geeky interface than the command line. With room to spare, I created a WPF GUI that controls the execution and reads logs to display status and stats. This was nothing particularly exotic, but I was able to fit in some nice touches like a graphical timeline to represent operations executed on the different threads. I wrote the GUI in XAML/C# using MVVM-Light. The parser/interpreter runs in a separate process, so that in case of a crash (not a remote possibility when you are pushing the hardware limits) the GUI keeps running and tells you what happened.


So 10 days had passed and this is what had been done:

  • the DSL grammar definition
  • a parser and an interpreter for it. It took slightly more than necessary because I had to learn FParsec along the way (this talk by Robert Pickering has been very helpful).
  • a GUI with some bells and whistles

plus some extras (that as you know better than me, are very time consuming):

  • a (admittedly basic) distributable package
  • the syntax highlighting definition for Notepad++ Smile
  • several code samples that show the DSL capabilities
  • the user manual and language specification (I got some help with that)
  • a tutorial

Developing the GUI and producing the extras went at normal speed, but I’m positive that writing this parser and interpreter in C# would have taken me close to the ten days alone. Maybe my standards are low, I don’t know, but I’m honestly blown away by what I could achieve in such a short time. Also notice that I’m much more experienced in C# than in F#.

Truth to be told, I had another advantage: this project was done in the year ending period when several people are on holidays and the office is very quiet. I also put in some late evenings, but I have a family with two kids, I just cannot code 24*7 even if I wanted.

The stars of the show

The goal of this post is not telling the world how fast I work. It’s impossible for anyone to judge if a project would have needed 2 or 100 days without knowing all the details. No, I’m writing this because I know all the details and I know that F# gave me a huge advantage. Much more so than I imagined when I started.

These are things that I think make F# ideal for a project like this:

Higher order functions

These are what allow libraries like FParsec to exist, amongst the rest.

Discriminated unions, tuples and pattern matching

This trio is worth alone the price of entry. They make for very terse code and bring other great advantages on the table as well.

It works the first time

I still don’t get why it is so. Maybe it’s because of the lack of nulls. Maybe it’s because (as I’ve written in part one) I think functional programming forces you to think more and write/debug less. The net result is that when I write F# I mostly get it right the first time. Because of the higher-order functions there are less corner cases that suddenly appear and crash everything.

Now most of these features are available in several functional languages, however the seamless .NET integration was fundamental in my case (the libraries I had to use are .NET), and some F#-only constructs make coding fun and speedy at the same time.


If you’re not living under a rock (like I’m literally doing right now –but that’s another story) you’ve sure heard of F#. Maybe you’ve even seen some examples, but as I’ve heard many times from C# developers, they looked incomprehensible. Don’t let that stop you, it’s just not true. If you’re new to functional programming it looks that way because F# is (mostly) a functional language, i.e. you’re not only learning a new language, you’re learning a new paradigm. A different way of thinking of your programs. It does take some effort, for sure. Is it worth it? It’s up to you to decide. To me, getting back to functional programming with F# after several years of OOP/C# has been a real breath of fresh air.

If you decide to learn more, here are some great places to start:

Advice for getting started with F# by Richard Minerich
An overview of functional programming by Dorian Corompt (recursion, lists, more to come…)

I suggest starting with the basics: you can already accomplish a lot with just lists, sequences, tuples, unions and pattern matching. When you feel ready you can move on to the more advanced topics.
Have fun!

Again, many thanks to Steffen and Samuel for the feedback!

Real world F#: my experience (part one)

I’ve been playing with F# on and off for about one year, but only recently I was able to complete a few “real world” projects. I was so impressed that I decided to share my experience. In this two-part series I will talk about two very different projects to give you an idea of how wide the spectrum of applications is where F# feels right at home.

The first project

The first project is named VeloSizer. You can check it out here (I may release it as open source but I’m still undecided on what to do with it). I assume you are not a cycling geek so I’ll spare you the details, but in short this application computes the bike setup given your position and the frame geometry. If you’re interested there’s a detailed description on the application page. Surprisingly enough, I’ve never found anything that does this very thing (except for full blown and expensive CADs), so I decided to write it myself.


The application is built in Silverlight: the XAML frontend is basically a glorified input form. It’s not particularly complex, some details are more complicated than it may look at first sight, but still there’s nothing extraordinary. I took a rather standard approach and employed the MVVM pattern (using MVVM-Light) for a clear separation of concerns. The View Model is C# while the Model –where the interesting stuff happens– is written in F#.

Solving this particular problem does not require very complicated mathematics, but involves a large number of geometrical operations (trigonometry and the likes). Without abstracting and hiding away all the math, the solution quickly becomes a nightmare that spirals out of control (don’t ask how I know). For this reason I’ve implemented a simple 2D CAD engine that sits at the application core.

How it went

Here are some things I noticed while using F# in a “real” project for the first time.

Units of measure

F#’s support for units of measures built straight into the type system has been very helpful to avoid stupid errors like mixing degrees with radians with millimeters, etc. It is really a plus when dealing with physical dimensions.


The language syntax is very light and unobtrusive, which makes it ideal to write mathematically-oriented code. The main benefit to me has been that the math stands out clearly, without parenthesis, type annotations or artifacts that make things harder to read. Also writing the code is a joy: you can really focus on the reasoning and almost forget that you are actually programming. In fact translating the equations written on paper to code is almost copy & paste.


I heartily agree with Richard Minerich when he says that testing does not replace a strong, theoretically-validated model. It’s the very same reason that pushed me to build most of this application’s engine on paper before writing a line of code. However I still make (lots of) mistakes when implementing a model –regardless of how correct it is– so I feel safer with the additional support of a solid testing framework.
The nature of functional programming makes it an ideal target for unit tests. Short, side-effects free functions are a joy to test. Result: it has been very easy to create a nice safety net in form of an NUnit project.
I must admit I would probably have written this library more or less using the same style in C#, but in functional programming this is the default.

Interoperatibily with C#/GUI

This is somewhat of a sore point. I don’t know if it is due to my lack of experience (likely) or the nature of a GUI-driven application, but I’ve ended up with many mutable (and not very idiomatic in general) classes, for two main reasons:

  • I had to persist the business objects (using the Sterling NoSQL database) and all the serializers for Silverlight need public setters as they are not allowed to use reflection.
  • With MVVM, each View is bound to its respective View Model, which is nothing more than a wrapper around its respective business object defined in the model (F#).
    Now when for instance the user changes a value in a TextBox, the new value is propagated to the View Model, which in turns propagates it to the Model. You can tell it’s not very practical to create a new instance of the model every time a value changes, so immutable objects do not adapt so well to the situation.

This means that the business objects are very C#-like. They still benefit from the lighter syntax, type inference, etc…, but they don’t fully leverage the power of the language. Fortunately the “application brain” does not suffer much from this.

Is this due to MVVM, XAML and in general GUI patterns being oriented towards the object-oriented paradigm? I don’t know. I’ve heard of a GUI framework specifically written for F#, but I don’t know much more.
I would be very interested to hear your opinion on this subject.

Note: as Stephen points out, keeping the model immutable may not be so much of a problem. I’ll give it a try.

Guidance and community support

The F# community is still small, but it more than makes up for it in quality. The active users on Stackoverflow and other sites are extremely competent. It’s rare to get bogus answers or to get stuck on a problem for long.
What I’ve found difficult though is getting guidance. I often ask myself if my code is well written or a pile of junk. I suppose the only solution is to refine my own sense by reading other people’s code.


Visual Studio’s Intellisense for C# is spectacular and has made us very lazy. F# support is much better than it was at the beginning, but it’s still not up to the same level of C#. In the end though it’s only lacking a few details like parameter names or support for the pipeline operator –the next release already includes some improvements in this area.


Setting breakpoints and watching state change is not simple in functional code because (usually) there is no state. If you debug a lot, this may be a bit unsettling at first, but then you realize it is not so much of a drawback. It is a benefit in fact. Breakpoints are evil: building a half-working solution, running it through breakpoints and tune it until the result matches what you expect is very close to the definitions of cargo-cult programming/programming by coincidence.
It is my opinion that functional programming makes you think more and write/edit/debug less. I believe this has made me a better developer because I now tend to stop, think about the solution “offline” and only write it down when I get it.


I can’t give any judgment on productivity because this application has been a pet project I’ve built alone without any deadline, working literally 15 minutes at a time. We recently welcomed another family member, which has made things even harder. Anyways it took me about 7 months to complete this project, but it’s very hard for me to tell if F# has given any productivity boost at all. More on this in part two.


It has been a real pleasure to write the F# part of this application. When you look at the application source, the first things that jumps to the eye is that the View Model (C#/OO) is way larger (in lines of code) than the model (F#/mostly functional), yet it only does “stupid” things: it’s almost exclusively made of property definitions, RaisePropertyChanged events, brackets, etc. It is like a very large box full of bubble wrap sheet, with only a small, precious gift in the middle.

That said I’ve been left with the impression that I haven’t used all of the language’s power. Writing the View Model in F# would only have slightly alleviated its ineffectiveness, what I need is probably a different pattern for GUI interaction.

In part two I’ll talk about a very different (and more interesting) project, where F# really shined. In the mean time I would be very interested to hear your opinions.

Thanks a lot to Steffen Forkmann and Samuel Bosch for proof reading and general feedback!

An error occurred while accessing IsolatedStorage.

While developing a Silverlight Windows Phone 7 app I ran into this problem: when the application closes, an internal  IsolatedStorageException is raised:

An error occurred while accessing IsolatedStorage.
   at System.IO.IsolatedStorage.IsolatedStorageSecurityState.EnsureState()
   at System.IO.IsolatedStorage.IsolatedStorageFile.get_AvailableFreeSpace()
   at System.IO.IsolatedStorage.IsolatedStorageSettings.Save()
   at System.IO.IsolatedStorage.IsolatedStorageSettings.TrySave()
   at System.IO.IsolatedStorage.IsolatedStorageSettings.SaveAllSettings()
   at MS.Internal.FrameworkCallbacks.ShutdownAllPeers()

IsolatedStorageSettings seemed to be the cause of the problem. Very weird because I wasn’t even using it! Just adding this line in my App class would cause the problem:

private System.IO.IsolatedStorage.IsolatedStorageSettings appSettings = System.IO.IsolatedStorage.IsolatedStorageSettings.ApplicationSettings;

Now I created a new, empty, project (Windows Phone Application) and inserted that line. Weird: no problems.
I thought there was something wrong in my solution, so I made a backup copy and started removing stuff. At some point my solution was identical to the default one (except for the IsolatedStorageSettings line and the project’s GUID). Not “almost identical”, but completely, literally identical, i.e. WinMerge found no difference except the two mentioned. And of course the default solution worked while mine gave the IsolatedStorageException on shutdown.

I honestly haven’t understood the cause of the problem, but at least I’ve found a solution: change the project GUID:

  • close Visual Studio
  • open your solution’s .sln file with notepad
  • replace a couple of numbers in the Project(“{<your GUID here>}”)
  • open the .csproj with notepad
  • apply the same change to the GUID (in the first line after the comment and in all the Build Configurations)
  • Re-open your solution with Visual Studio and the problem is gone.

I suspect the problem has something to do with the emulator but I haven’t had a chance to try a real device yet. I hope I saved you an evening of head scratching.

Best of Swiss Silverlight 2010

During this year’s Shape 2010 conference in Zurich-Oerlikon, Microsoft Switzerland announced the winners of the Best of Swiss Silverlight 2010 Award in collaboration with the Best of Swiss Web Association, simsa and Netzwoche.


Incredibly my application Trails of Switzerland won the Bronze award. I was completely taken by surprise (not to mention super excited) because I didn’t really expect anything when I started the project. In fact it was just a “weekend project” to try a couple of things. When I saw the award application form I thought it could be worth a try so I polished a bit the front-end and added a couple of cool gadgets.

I was familiar with the competition’s application procedure also because I already did it a few times before for my company (that won this year’s .NET Award by the way).


My application leverages Silverlight’s DeepZoom component to show a full topographic map of Switzerland. The base image is a huge 19 Gigapixels (~3 Gigabytes) JPEG, but movement and zooming is wonderfully smooth.

The tricky part was mapping the GPS data to the map and then keeping content synchronized with the DeepZoom zooming and panning.


Currently Trails of Switzerland is in closed-beta at http://maps.frenk.com and will probably never go live (except if someone wants to buy it from me) for the simple reason that copyrights on the maps are incredibly expensive and I cannot afford to buy them “just for fun”.
To be honest I must say that the Swisstopo maps are of incredible precision and quality, but still are way too expensive for a no-profit application.

The good part is that Trails of Switzerland could probably be ported to Windows Phone 7 (with a major restyling of course) as DeepZoom seems to work very smoothly there too.

I’d like to congratulate the other contest winners (Coresystems AG/Misapor, Extrafilm AG, VASP Datatecture AG/ETHZ, Portia AG/Immostreet AG): your applications were really mind-blowing!


The conference was very interesting as well. In particular Bob Muglia should have taken notes from Ronnie’s talk on HTML5/Silverlight. If you have watched the PDC2010 keynote (and the twitter/blog-storm that followed) you know what I’m talking about.

As always Laurent’s Bugnion’s talks were interesting, but also many others were worth the trip (in particular from a Windows Phone 7 and design/UX standpoint).

Moral: you never can tell

Moral of the story is you never know where a weekend project is going to bring you. It seems that someone else also agrees on this. This is one of the things I love in this field.


Thanks to Microsoft Switzerland and most of all thanks to my wife for the continuous support and for understanding when I forget stuff/don’t listen because I’m thinking about code (i.e. most of the time) :-)

Silverlight/WPF RGB color in c#

Sometimes you have a color in XAML and want to use it in c#. In other words you want to translate something like:

<Border Background=“#AA0FCC1B”>;


Border.Background = …something…

So you start looking for an IValueConverter that create a color from a RGB string, or translate the hex values to decimal, etc… STOP it!
All you need is:

Border.Background = new SolidColorBrush(
                            Color.FromArgb(0xaa, 0x0f, 0xcc, 0x1b));


Ok, it may sound stupid, but you never can tell…

WPF is dead. Long live WPF!

Some months ago I read in a blog post that Silverlight ate WPF from the inside. I had a good laugh and thought it was the most foolish thing I’ve read in a while. I even posted a comment that (thankfully) never got published. Having worked extensively with both WPF and Silverlight I thought the two things were not even remotely comparable. While WPF provided great power, Silverlight was full of limitations and getting any real work done was frustrating and painful.

Turns out I was wrong. Completely wrong! This week I attended TechDays (the small version of MIX that Microsoft does in European countries) and while nobody says it explicitly, the strategy at Redmond seems pretty clear. Silverlight is progressing at an impressive pace and WPF is not getting many exciting improvements. The gap is still there (still large to say the truth), but seeing SL reach and eat WPF is not that difficult. I think MS is pushing in that direction with all their forces.

Out of the browser was almost a gimmick in SL3, but with SL4 they revealed their cards: they added so many features (even COM support when running in Windows) that it’s now doable to build a desktop application entirely based on SL. You can even deploy it directly on the desktop without any browser interaction.

I’m pretty sure it will only take a few of years for Silverlight to be the Windows UI library, with the big bonus of true multiplatform, small runtime and web deployment with a single codebase. WPF won’t loose anything as it will just be part of Silverlight.

This is the future I think. Unless I’m completely wrong again.

Leaky Abstraction strikes again: FileStream.Lock() gotchas

First, if you don’t know the Law of Leaky Abstractions go on and read here (10 minutes well spent!).

.NET’s FileStream.Lock() is a handy method that lets you lock a section of a file (instead of locking it completely) so that other processes/threads cannot touch that part.

The usage is fairly simple: you specify from where to lock the file and how long is the section you want to protect. However, despite its simplicity, there are a couple of things it’s better to keep in mind or you’ll be scratching your head in front of the screen.

First: contrarily to what some articles say, this method locks part of the file for write but also for read access. Maybe those articles refer to an older framework version or whatever else, but a simple test seems to confirm that a process cannot read a part of a file that has been locked.

The second thing can be tricky.
Let’s make a simple experiment: we write 100 bytes in a new file and we lock everything except the very first byte. Then we launch another process that reads the first byte.

// first process:
using (var fs = new FileStream("myFile.txt",
    using (var bw = new BinaryWriter(fs))
        bw.Write(new byte[100]);
    // locks everything except the first byte
    fs.Lock(1, 99);
    fs.Unlock(1, 99);

// second process (first process is waiting at Console.ReadLine()):
using (var fs = new FileStream("myFile.txt",
    using (var br = new BinaryReader(fs))
        // read the first byte
        var b = br.ReadByte();

What happens? The second process throws an exception: The process cannot access the file because another process has locked a portion of the file .
Why? We didn’t try to access the locked portion, so this should not have happened!

At first you may believe that Lock() is buggy and locks the whole file. But this is not true, in fact Lock() works correctly.
The answer is in the FileStream’s buffer (I hear the “aha!”). In fact when you ask FileStream to read a single byte, he’s smart enough not to read a single byte but to fill its internal buffer (4K by default) to speed up the reading. So it tries to read into the locked part and fails.

Now that you know why this is happening you can more or less easily solve the problem depending on your situation: you may for ex. adjust your buffer size depending on the length of the chunks you are reading.

In the example above it’s enough to pass 1 as buffer length to the second process FileStream’s constructor (after line 21) to make it work (just to show the theory, not that this is a good practice!).

I really think that the FileStream abstraction should handle this case and avoid the “leak”, but the .NET framework guys are smart people and I bet there is a good reason if it doesn’t.

WCF/Silverlight – some “benchmarks”

I took some very simple measurements from my recent experiments with Silverlight and WCF web services. These are so simple and unscientifical that I suggest you take them only as a general indication.
Please test your scenario to get an accurate picture!

That said, some differences are so large that it already gives you a general idea. These are the http bindings I tested:

1) text formatter (default), i.e. SOAP XML
2) binary formatter, i.e. binary XML
3) text formatter with http gzip compression (see my previous post)
4) binary formatter with http gzip compression

Here are the results.

Response time


You can see that text formatter is incredibly slower than binary XML.
One interesting thing I noticed is that this extreme slowness of the text formatter happens only with Silverlight (3). That is, if you use a Windows client (console app or wpf), text formatter is still slower than binary formatter but not that much (compare the red bars).

The Silverlight runtime is probably slower than the Windows runtime and I guess that deserializing a huge xml message is one of the things that clearly expose this difference.

Another observation is that with gzip compression the response time is slightly slower. Keep in mind that these numbers come from a connection on a single machine. In real world, with a large message, size will be so much smaller with compression that the compression overhead is probably largely compensated.

(Side note: these tests were quick and dirty, but I still did some “warmup” calls and measured over multiple runs, so these timings are pretty stable.)

Message size


The message I used was quite large and included a pseudo-dataTable. This means that XML serialization results in a lot of string repetitions: a particularly good target for zip compression. Other cases may not benefit so greatly from gzip compression.


This is clearly not a conclusive test -it may not even be enough to be called a test- but one thing is clear: it is worth it spending some time to play with the different binding options as the benefits you could reap may be huge.