Posts Tagged ‘Databases’

Porting Microsoft SQL Server to Linux

January 15, 2012 Leave a comment

So why didn’t Microsoft take SQL Server to *nix?  On one occasion a partner commitment that might have made it viable failed to materialize.  On another occasion I initiated the investigation on the basis of a partner request but then decided it was a bad idea.  Here is why:

There are five things you have to consider when evaluating if Microsoft should take SQL Server to *nix:

  1. What exactly is the product offering you intend to bring to *nix, does it have a real market there, and can you position the offering to succeed?
  2. What is the impact of going multi-platform on the product family, engineering methodology, organization, and partner engineering organizations?
  3. What is the business model, including how do you partner, market, and (very importantly) sell into the Enterprise *nix world when you are a company that has no expertise in doing so?
  4. How do you provide Enterprise-class service for SQL Server when it is running on a platform that your services organization has no expertise with?
  5. What is the negative business impact on with entire Windows platform associated with making a key member of the server product family available on *nix?

via Porting Microsoft SQL Server to Linux | Hal’s (Im)Perfect Vision.


Massive Scale Data Mining for Education

November 17, 2011 Leave a comment

Let’s say, in the near future, tens of millions of students start learning math using online computer software.  Our logs fill with a massive new data stream, millions of students doing billions of exercises, as the students work.

In these logs, we will see some students struggle with some problems, then overcome them.  Others will struggle with those same problems and fail.  There will be paths of learning in the data, some of which quickly reach mastery, others of which go off in the weeds.

via Massive Scale Data Mining for Education | blog@CACM | Communications of the ACM

Parallel Analysis with Sawzall

October 17, 2011 Leave a comment

Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on.

We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design — including the separation into two phases, the form of the programming language, and the properties of the aggregators — exploits the parallelism inherent in having data and computation distributed across many machines.

Source: Google Research Publication: Sawzall.

Now Microsoft comes up with a road map for Hadoop efforts

October 14, 2011 Leave a comment

Here is the post: Microsoft’s Big Data Roadmap & Approach – SQL Server Team Blog – Site Home – TechNet Blogs.

As we have noted in the past, in the data deluge faced by businesses, there is an increasing need to store and analyze vast amounts of unstructured data including data from sensors, devices, bots and crawlers and this volume is predicted to grow exponentially over the next decade. Our customers have been asking us to help store, manage, and analyze these new types of data – in particular, data stored in Hadoop environments.

I am not sure about IBM, but this makes two of them – Microsoft and Oracle, finally realizing the power of Hadoop and acknowledging that they have to do something about it. Pretty late already, I will say. Earlier, Oracle had something similar to say in OpenWorld 2011.

My Hadoop experience so far has been limited, but nevertheless amazing in every sense. I was blown away by the simplicity, more than anything else. It takes very minimal effort (okay, you do need the understanding) to setup a Hadoop cluster and run a task distributed on several machines. We were using 5 laptops and although the task we wrote was a simple search, but with the effort that was put in, I never expected it to work. I was surprised when it sailed through.