Splitting Strings With Substrings

The String.Split() method in C# is probably something with which any C# developer is familiar.

string x = "Erik.Dietrich";
var tokens = x.Split('.');

Here, “tokens” will be an array of strings containing “Erik” and “Dietrich”. It’s not exactly Earth Shattering to tokenize a string in this fashion and some incarnation or another of this predates .NET, C# and probably even my time on this planet.

But what about if we want to split over a string instead? What about if we have “..” as a delimiter instead of ‘.’ and I want to split “Erik..Dietrich” in the same way? Probably an overload of String.Split() that takes a string instead of a char, right? Well, actually no. As it turns out, the API for string.Split() is pretty unintuitive.

First of all, that call to x.Split(‘.’) is not actually invoking Split(char), but rather Split(params char[]), notwisthanding the fact that this isn’t advertised in the MSDN page unless you drill into the individual method. So, calling x.split(‘.’) and x.Split(‘.’, ‘&’, ‘%’, ‘^’) are equally valid, syntax-wise in the case of “Erik.Dietrich” (and in this case, both will give me back my first and last name).

So, what one might expect is that there would be an overload Split(params[] string) to allow the same behavior as splitting over zero or more characters. Nope. Instead you have Split(string[] separator, StringSplitOptions options). Two things suck about this. One, I have to specify some enum that I don’t care about in the first place and that has only two options, one of which is “none”. I mean, really? You can’t just assume “none” and let users specify a different case if they want with another overload? But what sucks even more about this is that params have to be the last argument in the parameter list, so that option is out the window. You no longer get that snazzy params syntax that the char version has, and now you have to actually awkwardly create a string array. So, here is the new syntax following the old. Note that the new syntax is pretty hideous:

string x = "Erik.Dietrich";
var tokens = x.Split('.');

string y = "Erik..Dietrich";
var newTokens = y.Split(new string[] { ".." }, StringSplitOptions.None)

I was getting ready to write something to hide this mess from myself as a client, when I stumbled across a better alternative than rolling my own extension method or string splitting class: Regex.Split(). Here’s how it works:

string x = "Erik..Dietrich"
var tokens = Regex.Split(x, "..");

No fuss, no muss, and exactly what String.Split() should do. Granted, the arguments to Regex.Split() are both single strings (so if you want to specify multiple delimiters, you’ll have to cook up a regex recipe) and it’s a static method, but it has the advantage of already existing in the framework and being a much, much cleaner API than x.Split().

Use in good health!

  • http://www.facebook.com/jimwang Jim Wang

    Ha, I miss coding. Oh wait no I don’t. :)

  • http://www.daedtech.com/blog Erik Dietrich

    What you really miss is staying up all night trying to get C code to work using Pico in a telnet session before an 8 AM deadline. Man, those were the days. By the way, love the bargaineering site. It’s like a more practical and much more frequent version of Money magazine.

  • http://www.facebook.com/jimmnowak Jim Nowak

    I love C#. I really do. But when I found out it would not take, very elegantly, multiple delimiters, which I use a lot… WHAT?! Even .js works better than this! This was the nice elegant solution I was looking for. Thank you!

  • http://www.daedtech.com/blog Erik Dietrich

    My thoughts were the same as yours before I found this solution, so I definitely empathize. Glad if it helped!

  • ling maaki

    C# String operations http://csharp.net-informations.com/string/csharp_string_tutorial.htm covering most of string class mathods
    ling