Saturday 27 December 2008

Karen Spärck Jones' homepage gone :(

I'm a little bit sad to have to edit Wikipedia entry on Karen Spärck Jones and remove the link to her home page on University of Cambridge Computer Science department's web space, as it appears to have been removed. This would have been a nice snapshot of history to have kept online, even if they have added an obituary and CV to their site.

Enjoying the holiday break - I hope you are too.

Monday 15 December 2008

Quiet Blog

Despite promising to be more vocal on my blog, a very unsmooth move of house means that i have barely seen the internet! I have been searching a lot though. which box is X in, etc. More to come here soon. Max

Friday 5 December 2008

Search Result Layouts

With the release of Yahoo's search monkey, and Google's Search Wiki, its been pretty exciting to see big companies (beyond Ask.com who set off with some novel aspects like adding a thumbnail per result) experimenting heavily with their layouts. I didn't even notice for ages something different that Live Search does - if a Wikipedia entry is a result, then it puts the whole first paragraph of the entry as the text snippit, regardless of how long it is (it seems). This appears to have occured because the first paragraph of a Wikipedia entry is perhaps the best overview of that topic you're gonna get. This is contrary to the policy, promoted academically by White et al in SIGIR02, that putting the sentences that includes your keywords in the result is currently the best policy.

Search Monkey is interesting, too, because it allows you, as a user, to control how your web search results come up. If you want a special view of imdb results, or wikipedia results, you just include the available extra template. It will be interesting to see if representations stablise any time soon to produce a new standard after the now familiar: name, snippit, link combo.

I'm Back...

I've been totally off radar now for 3 months according to my last post! Where did it all go wrong? I blame the thesis! I've been all over the place recently, interning at Microsoft Research Cambridge, and trying to finish of my PhD work, that I put myself down into a work hole, until i realised that i'd cut myself off from some of the best resources available to me: like Daniel Tunkelang's Blog: The Noisy Channel. Step 1, obviously, was to read the >150 posts of his I was behind on. Always worth a read.

During that time, however, I've been stewing over a lot of cool stuff that I've seen including some pretty interesting events in London. I'll get right back on the horse with the next post!

Tuesday 26 August 2008

Exploratory Search or Cognitive Overload

I've been pretty quiet here on the blog recently. I hear that's what happens when you are writing up your thesis!

Anyway, recently I wrote about how cognitive load theory might help us understand any negative effects of adding more exploratory search features. The example I used was that faceted search might overload users with too much information, given all the metadata that is presented. Well I have written a technical report on it. So give it a read if you like!

Thursday 14 August 2008

Freebase Parallax

The video of the new Parallax interface for Freebase, recently distributed by David Huynh, is brilliant. I totally recommend you watch it. A fellow exploratory search fanatic, Daniel Tunkelang, has blogged about its qualities in terms of linking between different entity types, which is quite special.

What I especially love about is the simple clear layout metaphor that it uses. It's a very clean left-to-right system (although this, i suppose, is more familiar for Western countries). It works a little like this: Your current results are smack bam in the middle infront of you. Anything to the left is something 'before' and anything to the right is something 'after'. The facets on the left are ones that apply to the current type of object in the results list. So if you select an item from any of the facets to the left (which are 'before' your current results) your results are reduced/filtered. The example in the video is that you are looking at a list of presidents, and by selecting a political party on the left, the presidents are filtered.

The facets on the right, however, are different types of related objects that you could move 'forward' to from your current data. The example they give is to the presidents' children. This moves you forward to see a list of people who were the children of (the filtered set of) presidents. They have different (or though some will overlap (like gender)) facets on the left which can be used to further reduce the children shown. The facets on the right are new objects that you can move 'foward' to from children.

All the while the left to right progress is mirrored by a left-to-right breadcrumb above the current information so you can see the steps 'forward' you took.

its all very nice, clean, clear, and still lets you browse endlessly through interconnected heterogenous data types. Nothing less that you expect from David Huynh!

Monday 11 August 2008

can faceted search overload users?

For a while now, its been a concern for mSpace that we have been classified as good for intermedia/expert users, when lots of our work has focussed on making searching easier for people. We showed that the spatial layout was good for elderly people whose working memory was less capable at remembering things as other brosers change their layouts over time. Having said that, we too have watched participants of userstudies experience a moment when they first see the interface, with not knowing where to start.

An approach taken by other faceted browsers, such as the one provided by Endeca, is to change the layout a lot, but by taking away the decisions that people have made, so that all they have to do is look at the most important factors remaining. It's the same policy that google have - keep the options and ui as simple and clear as possible.

I've been seeking a way of finding out once and for all if there is a measurable aspect of UIs that we can use to prove that one way is better than the other. or that there is no difference at all. Certianly in mSpace we have shown that any overload experienced at first soon expire, and users have a real rich experience during search. Can mSpace be redesigned slightly to remove this initial wall and still give them the added richness of interaction that we have been striving to improve over the years.

Excitedly, i read a paper on cognitive load theory this morning on the train, which talks about what aspects of computer interfaces might make it harder for people to find and learn from information. I think this may hold the answer I have been looking for. It has measures and terms within the cognitive load theory for the effect caused by having duplicate, or redundant, or combined sources of information, on peoples ability to clearly and easily use a UI. I'm going to try running some numbers to see if any significant differences in the approaches taken to faceted browsing that might reveal why mspace is deemed intermediate/expert, and the approaches used by websites like walmart and diy.com are deemed simple for novices.

Monday 28 July 2008

The Tetris Model of Information Seeking

The more and more I've been reading about other models of information seeking (such as Marchionini 1995 and Kuhlthau 1993 and many more), the more I've been annoyed by how limited to a sequential flow they are. In Marchionini's, for example, there's a clear progression from problem identification, to specification, to seeking action, result viewing and resolving the problem. The model has this nice step towards the end that says 'refinement' and the text has a clause to say that people may drop back to almost any previous point. I believe a text clause like that is an indication that there should be a better way to model Information Seeking.

The thing I did like, was that each step was a different rectangular shape, based on how much time and computer involvement it required, as the two dimensions. These two observations about the model have led me to my tetris model of search, which I'm going to blog about here for a bit to test the water. You'll probably see followup blogs! I've got a lot to say about it!

Now, in Tetris, different shapes fall from the top of the screen, and success is modelled by organising them so that entire horizontal lines are made, removed from the display and converted to points. Let's first take the analogy that resolving an information seeking problem is like clearing a line of the board and that solving a bigger problem is like clearing multiple lines of the board, and finally that your score is representative of the overall knowledge you have on that topic.

Let us then imagine that the pieces that fall down from the top of the screen are then any one of the stages that are found in models like those mentioned above, where the ideal is that you get a series of simple pieces, representing a simple problem, a simple spec, a simple query, and a simple answer. BAM one line, problem solved.

BUT we all know that life is not like that, and regularly you get a nice simple first block (or you think you have a simple problem to solve) and then you get a + shape answer when you view the results that tells you your problem is a little more complicated than that. What we begin to see is that the complexity of a problem is actually represented not by the pieces, but by the current depth of the board. Each piece, therefore, represents an action, such as realising a problem, performing a query, etc.

So, a simple lookup on google is represented by a series of easy bits (specing, querying, viewing, etc) fitting together nicely and a line clearing. If you have a complex problem, however, the first bit you get is complex, like a +, and then you may need a combination of queries, and results to resolve your problem, and shift the 3 lines built by the +. Exploratory search can also be modelled with this analogy. If a user starts with a simple problem and starts off by querying for 'classical music' and then the first resullt says well there are lots of types of classical music:.... this means the next piece you got was a + and so getting an answer to your first query broadens the work you have to do to better understand classical music. Then, over time, you can resolve bits of information, find new problems you need to learn about. get some simple answers to fill in the gaps. Over time you may find that there are always rows with holes in, that you might take years to get back to them and fill them.

that was long, but think about it. I think its a pretty good analogy. Comments?

Friday 25 July 2008

When should users be made to think?

I've been a little quiet, I know. I've been working hard consulting for an interesting new client on a project that has, yet again, completely consumed my interests with new challenges. In this case, its amazed me that one of the primary concerns of this project has not been to make the interaction as quick and simple as possible, but to produce software that is a) intuitive and b) coerces users into thinking about the appropriate things at the appropriate times.

I've noticed this has become a recurring theme in many scenarios. The first time I heard something along this line was an argument against automating the jobs of pilots, but for making the jobs easier. If pilots get used to the plane doing most of the jobs for them, they may become less capable when the plane malfunctions. However, if the actions are made easier to perform, then the skill is maintained, but the usability has improved.

Search can be considered in a similar way. While lots of search designs are focused on letting users express the knowledge or known constraints that they do have quickly, this can leave users with problems when they have to choose between their results using facets that they have not considered. In this case, we do not at all want to make assumptions, but at the same time, we do not want to leave them to make a decision with only a list of options to do so with.

In some of our previous work we have investigated how giving users example (ideally multimedia) result items that would be associated with each of the items in their new decision. This means that users are become aware of things that the should think about before they buy, AND give them the means to understand the effect of their decisions. Another approach mSpace has taken is to be subjunctive, by allowing users to easily change their mind, or considered another way: rapidly tryout different options by minimising the costs of reversing their decision. To do this mSpace maintains all of the options a user was given at each step, so that the user, with a single click, can switch between different items in the same facet, and see the effect is has on the results.

Wednesday 16 July 2008

viacom aint all bad - perhaps i love viacom?

I've been a bit quiet recently, having been travelling to institutions around the UK, and then vacationing a bit. But i was pleased to see the news on the row between YouTube and Viacom. Thankfully they have agreed to let Google anonymise the usage histories that will be sent to Viacom as part of the ongoing copyright legal battle.

This only leads me to think that Viacom has created an even more perfect dataset for us to run user analysis over, like the longditudinal study we recently presented at JCDL2008 this year. I assume each user will get an anonymous ID, so that anonymous individual activity can be followed?

Can we have it after you Viacom? We'd love you!

Saturday 5 July 2008

the world is connected



actually, from the map it looks like the 1st world is connected at least.

interesting visualisation, though, of twitter conversations.

compared to some of the other network visualisations, this one shows how being grounded in a known 2.5D space makes it a lot more accessible.

Friday 4 July 2008

best search dataset ever?

In a current lawsuit between Google's YouTube and Viacom, Google have been instructed to hand over their logs of user viewing habits. Google are upset about it, but at least the judge overruled the request for Google to hand over their source code for filtering copyrighted material! Google have requested, although its not been confirmed yet I believe, that they get to anonymise the logs first, to respect the users' privacy. i personally hope this is allowed.

Anyway, despite this interesting privacy issue, i can't help but thinking that 12 terabytes of usage logs from youtube would be an AMAZING research resource for investigating user behaviour. Sadly they dont have facets to chat about, but it could tell us how people have used query refinements, spelling corrections, categories, filters, similar clips, recommended clips and so much more!

Google might as well do something useful with it, if the data is going to shown to at least one third party.

Saturday 28 June 2008

the best way to provide faceted search?

As research has produced an increasing number of insights into the different ways of providing faceted metadata to users in the form of a faceted browser, the question has become: what actually is the best way to provide faceted search? This same question has not really been seen in typical information retrieval, as each bit of research has (usually) incrementally improved the system performance, and a good keyword-based search system will try to include all the advancements in their UI (not that Google is providing interactive query refinements).

We actually do see faceted search all over the place. iTunes has it in their 'browser' function (3 columns that filter to the right). Google product search lets you refine by facets like brand and price (each facet filters every other facet). Endeca seem to be selling it to everyone these days (right on!), including Walmart and Borders.

There are actually 2 layers to this question: how best to provide a faceted classification and how best to provide a faceted browser. The earlier has been well investigated, with advice from Marti Hearst. Endeca certainly seem to ask 'after each click in a facet, what is the best set of facets and values to show the user?'. The second question is less well known. Even one of Endeca's clients, the NCSU library, are asking: should we have the facets on the left or the right?, should we place a breadcrumb or a list of decisions?. How does this affect the user?

Further to these layout questions, I have been trying to work out for a while now whether the structured and consistent iTunes approach is better or worse than the dynamic adaptive approach taken by Endeca? Especially with all the additional functionality (e.g. column swapping and backward highlighting) we have been adding to the iTunes-style approach with mSpace. There are even more additional questions to ask. maybe its a case of when is one better than the other? Finally, can we somehow take the best of both worlds, so that we can figure out what to add to our faceted browsers that make them incrementally strong.

Friday 20 June 2008

privacy and social in-security

Its by no means been the focus of the collaborative search workshop this week, but the issue of privacy, not surprisingly, came up in terms of what your collaborators can see about your actions. There are obvious things to be concerned about, like do you all have the same clearance, for example. But ALOT, recently, I have heard people talking not about the insecurity of the systems, but the insecurity of the users!

The example we heard here, proposed by Merrie Ringel Morris, was if you are searching for something with someone superior to yourself, and you do something stupid. what if you do not want them to see that you're being 'sub-optimal'. or if you have to look up something they said, when you are trying to make a good impression.

These are interesting privacy and 'insecurity' issues, where users still want to protect themselves. I've not seen much on it though? anybody?

Wednesday 18 June 2008

first faceted system?

Daniel Tunkelang, chief scientist at Endeca, has passed on an excellent entry on perhaps the first faceted system. It's actually come at a very timely point during the JCDL08 conference, where faceted browsing has been quite core to a number of discussions. I'll post more about this later. Someone even asked me about Ranganathan's colon classification (research core to the start of faceted classifications) after my talk on a longitudinal study of faceted and keyword use, and now i have link to research it further - thanks Daniel.

Monday 16 June 2008

collaborative IR workshop

Let me start by apologising for any rubbish you find below - i just landed in pittsburgh for JCDL08. I keep going form hot sweats to shivers depending on if i'm outdoors or in!

Next week is the, what looks to be, exciting first international workshop on collaborative information retrieval. to be clear, the main focus is on teams of people trying to achieve a shared goal, either co-located or distributed, and either at the same time, or asynchronously.

One of the papers sets itself out from the crowd, as far as I am concerned, but also worries me slightly. Instead of delving straight into ideas of communication and task-allocation, one author (who shall remain nameless till after the workshop - but attendees have been asked to read each paper before the event) steps back and asks: what is the definition of collaboration, and how does it differ to/consume cooperation, coordination, and many more similar terms. His paper is clearly well researched and well informed, but the level of model-detail also worries me: how much detail is too much detail on these things, when designing a model. Interfaces that try to differentiate/support each individually could be confusing. The discussion will certainly be valuable, and that and other papers will make a very intersting workshop. stay tuned to hear more about it.

first time for JCDL08!

Thursday 12 June 2008

exhibiting exploratory behaviour

Next week I am giving a talk on our paper at JCDL08, on the longitudinal real-world usage of a website that has both faceted and keyword search persistently available. One of the aims of this research was to see how people changed behaviour over time, as they grew more familiar with both the data and the website. This is motivated, of course, by the notion of Exploratory Search, which represents users who dont necessarily know what they are looking for or how to find it.

It has only struck me recently how undefined exploratory behaviour really is. Originally, it was suggested that people who are exploring would click around on things such as facets and categories, rather than keyword search, because they do not know what to search for. Then later they would keyword search, because they have learned whats available on the site.

The alternative view is that people who really don't know what to search for, start with the 'vague query', and then use the facets to refine.

What we saw in the study is that people exhibit either pattern of behaviour at any stage, and this idea of order is not the variable that defines exploratory behaviour. For example, some experienced users were using the facets to produce very specific queries, rather than typing boolean queries into the keyword search box. Similarly, we saw experienced users start with a keyword search and then narrow the results down effectively.

So what variables do identify exploratory behaviour? is it this effective behaviour? if we see a lot of similar queries or a lot of swap and change within one facet does that make them a learner? because i can sure think of occasions when one problem involves selecting lots of items in a column, regardless of whether im good or bad at it: where to go on holiday? you could select lots of countries and cities.

one of our earlier papers (a few years ago) thought maybe it was the idea of backing out of your decisions.

in the beginning...

...there was space - for blogging. Inspired by the ever interesting blogs produced by Daniel Tunkelang (Chief Scientist at Endeca), on the topics of HCI and IR/IS, I have decided to try and blog some of my own thoughts. I'd put some in here, but then the title would be misleading, and then it would be much harder to find! The interesting stuff should start asap.