Wednesday, March 16, 2011

Copying Managed Metadata between site collections with code

I was recently writing an event handler on a list that was suposed to copy the contents of a list item form one site collection to another. The code was fairly simple except that the Managed Metadata columns would not copy.

My first attempt was to simply copy the value of the taxonomy field from one list item to another, this worked for all the other fields in question, so why not?

//targetItem and sourceItem are SPListItem type
targetItem["metaColumn"] = sourceItem["metaColumn"];

This didn't work. I then did a quick search and found (somewhere that I can't find now to give credit) that I had to use the SetFieldValue method of the TaxonomyField class to assign the value. Ok, so my next attempt was:

TaxonomyFieldValue value = (TaxonomyFieldValue)sourceItem["metaColumn"];
TaxonomyField taxTargetField = (TaxonomyField)targetItem.Fields.GetFieldByInternalName("metaColumn");
taxTargetField.SetFieldValue(targetItem, value);

This didn't work either. I suspect this would have worked if the two list items were in the same site collection.

The next attempt actually worked, and it went like this:

 TaxonomyField taxTargetField = (TaxonomyField)targetItem.Fields.GetFieldByInternalName("metaColumn");

TaxonomySession taxonomySession = new TaxonomySession(targetItem.Web.Site);
TermStore termStore = taxonomySession.TermStores["ManagedMetadataService"];

taxTargetField.SetFieldValue(targetItem, termStore.GetTerm(new Guid((sourceItem["metaColumn"] as TaxonomyFieldValue).TermGuid)));

The main point to notice here is that I had to create a new reference to the term, using the context of the target site. So it seems that setting the value of a taxonomy field requires using a Term or TaxonomyFieldValue that is fetched out of a TaxonomySession from the same site as the field itself.

Thursday, March 10, 2011

Why I think ViewState is EVIL

The infamous ASP.NET ViewState has been with us for a long time now, and yet I often get the feeling that most developers have no idea how it really works. Over the years, I have told the following story to a number of developers, and while there are a number of articles on the web about this, I still keep seeing code that makes me cringe too often. Perhaps my post can add to the various articles and cause a critical mass that will once and for all put an end to ViewState. Or perhaps not. :)

Enough rant, let's dive into the tech stuff. Take the code below, seems fairly common right? It should, I copied this from MSDN.

private void Page_Load(object sender, System.EventArgs e)
{
   if (!Page.IsPostBack) 
   {
      // Put user code here to initialize the data source 
      // (such as filling a dataset)
      DataGrid1.DataBind();
   }
}

Let's look at this code in detail. This is the page load event, it fires fairly early in the ASP.NET page life cycle, and it fires every time the page is loaded. This event includes a comment that suggests we need to add some code here to initialize data, but since that code can be expensive to run, it is cleverly wrapped in a condition that ensures it is only run the first time that page is loaded. The idea is sound, why go to the database for the same data over and over again right?

Let's think about this a little more. When the page is loaded after a post back, the data needed for the datagrid is not fetched from the database, and the databind event is not called. (It can be called from an event handler, but not necessarily). So where does the data for the datagrid come from? In most solutions that use this type of code, that place is the ViewState. It is the most 'out-of-box' way to get a datagrid to work, and thus this code is quite prominent.

Ok, so is there a problem? Yes, and a large one. First off, what is the ViewState really? Despite what many developers believe, it is not a magical property bag that makes your life easier, it is just a hidden input field in the html rendered to the client. Do a view-source on any asp.net web forms site and you'll find something like the following, except I cut out a lot of the VALUE:

<input id="__VIEWSTATE" name="__VIEWSTATE" type="hidden" value="/wEPD....2Lk=" />

The VALUE that I cut, is actually an encoded version of all the data needed for the various controls that are set to use the ViewState.

Let me repeat something at this point. The ViewState is a hidden input field in the html rendered to the client. For those who think about performance, you should be realizing a problem. For all the rest of us, I present a picture:


In this crude diagram you see the web server, the database server and the client which for illustration purposes I made a smart phone accessing your website. The connection between the web server and the database is drawn to be a fat pipe of wire, since they typically sit in the same data center that you have influence over. The connection between the web server and the phone is a 3G connection.

What the code that we see at the beginning of this article does is as follows. The first time the phone request our site, the web server gets the data for the datagrid from the database server over the big fat pipe, generates the HTML including the ViewState and sends it down to the phone. That means that the data for the datagrid is not being sent to the client just once, but twice since once copy is in plaintext and the other copy is encoded in the ViewState. Worse yet, when the client decides to post back to the page for some reason, the web server needs the client to post the ViewState data back so that it can render the new HTML. This is really important to understand. Instead of getting the data for the datagrid from the database using a big fat pipe of a connection, we are getting it off a phone over a 3G connection. This is the worst place I can think of to be caching data. Again, instead of using a fast wired local connection to a fast database, we are fetching that data over a 3G connection from a slow smartphone.

Now that you understand what is going on, here are some other not so small issues to consider.

1) The ViewState is encoded. This means that every time the page is sent and received, it has to encode and decode the ViewState. It may not seem like much but I have seen this bring a server down at a large financial institution.

2) In the current world of mobile internet, users do not appreciate when we send them large pages because we are using their phone or tablel as a data caching solution. First of all the pages are slow, and second, not everyone has an unlimited internet contract so extra kbs count!

So what can you do? Turn off the ViewState. Guess what, you can get most of your website to work without it. There are some features of some ASP.NET controls that do need the ViewState but not that many. Personally I don't use those features and find alternatives. Still, most of ASP.NET works just fine without ViewState, but you do need to put a little extra work into it.

Start by reading an article like this one detailing how ViewState works, and always question where your data comes from.
http://msdn.microsoft.com/en-us/library/ms972976.aspx#viewstate_topic8