For my first c# code post I will start of with an easy one: Calculating the mean, the variance and the standard deviation of a set of data using c# math. These 3 functions were the first I had to write when I started on the SoapSynergy project written in c# for analyzing the stability and the flexibility of the human motor system.
- The mean is just the average, the value that is the sum of all values, divided by the number of values.
- The variance is a way to measure how far a set of numbers is spread out. The variance is an measure of how much a set of numbers change, how much variation there is in those numbers.
- The standard deviation measures how far the values in a set are spread out from the average, just as the variance does. But since the SD (standard deviation) uses the same units as the mean it is easier to interpret.
Now to show you a quick code implementation of these 3 routines using c# math. Since I required means of n samples of data as well I used overloaded methods to support computation of the entire set as well as subsets. These methods also show the concept of extension methods in c#, in this case extending the default set of a generic list of doubles.
using System; using System.Collections.Generic; namespace SampleApp { internal class Program { private static void Main() { List<double> data = new List<double> {1, 2, 3, 4, 5, 6}; double mean = data.Mean(); double variance = data.Variance(); double sd = data.StandardDeviation(); Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd); Console.WriteLine("Press any key to continue..."); Console.ReadKey(); } } public static class MyListExtensions { public static double Mean(this List<double> values) { return values.Count == 0 ? 0 : values.Mean(0, values.Count); } public static double Mean(this List<double> values, int start, int end) { double s = 0; for (int i = start; i < end; i++) { s += values[i]; } return s / (end - start); } public static double Variance(this List<double> values) { return values.Variance(values.Mean(), 0, values.Count); } public static double Variance(this List<double> values, double mean) { return values.Variance(mean, 0, values.Count); } public static double Variance(this List<double> values, double mean, int start, int end) { double variance = 0; for (int i = start; i < end; i++) { variance += Math.Pow((values[i] - mean), 2); } int n = end - start; if (start > 0) n -= 1; return variance / (n); } public static double StandardDeviation(this List<double> values) { return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count); } public static double StandardDeviation(this List<double> values, int start, int end) { double mean = values.Mean(start, end); double variance = values.Variance(mean, start, end); return Math.Sqrt(variance); } } }
Pingback: C# Math: Root Mean Square - Martijn Kooij
Hi Martin,
I found your Standard Deviation implemantation
http://www.martijnkooij.nl/2013/04/csharp-math-mean-variance-and-standard-deviation/
and used it for a while.
Lately, I compared the StDev result of your implementation against the same data in a SQL Server stored procedure and found a different result. The same data in Excel returned the same result as SQL Server.
Finally I used (C#) dataTable.Compute which also returned the same result.
Do you have any idea what causes this difference? I pasted some sample code below.
Regards,
Goos van Beek.
using System;
using System.Collections.Generic;
using System.Data;
namespace ConsoleApplicationStDev {
internal static class Program {
static void Main(string[] args) {
List data = new List { 4201, 4210, 4218, 4218, 4221, 4223, 4223, 4228, 4231, 4238 };
Console.WriteLine(“Via Class \t{0}”, StDevFromClass(data));
Console.WriteLine(“Via DataTable \t{0}”, StDevFromDataTable(data));
Console.ReadKey();
}
internal static string StDevFromClass(List data) {
return data.StandardDeviation().ToString();
}
internal static string StDevFromDataTable(List data) {
var dataTable = new DataTable();
dataTable.Columns.Add(“Width”, typeof(double));
foreach (var item in data) {
dataTable.Rows.Add(new object[] { double.Parse(item.ToString()) });
}
return dataTable.Compute(“StDev(Width)”, String.Empty).ToString();
}
}
public static class MyListExtensions {
public static double Mean(this List values) {
return values.Count == 0 ? 0 : values.Mean(0, values.Count);
}
public static double Mean(this List values, int start, int end) {
double s = 0;
for (int i = start; i < end; i++) {
s += values[i];
}
return s / (end – start);
}
public static double Variance(this List values) {
return values.Variance(values.Mean(), 0, values.Count);
}
public static double Variance(this List values, double mean) {
return values.Variance(mean, 0, values.Count);
}
public static double Variance(this List values, double mean, int start, int end) {
double variance = 0;
for (int i = start; i 0) n -= 1;
return variance / (n);
}
public static double StandardDeviation(this List values) {
return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
}
public static double StandardDeviation(this List values, int start, int end) {
double mean = values.Mean(start, end);
double variance = values.Variance(mean, start, end);
return Math.Sqrt(variance);
}
}
}
The difference is caused by the fact that I used the population variance in this example, and Excel, SQL and dataTable by default use the sample variance.
Depending on your goal you could choose to use either the sample or population variance. Sample variance can be computed as follows:
Hi Martin,
Thanks for your quick and clear explanation.
Best regards,
Goos.