A blog on math

October 28, 2009

Reasons to Keep a Blog and How to Get Beta Users

Filed under: Uncategorized — Bryan Bell @ 9:27 pm

Number one, if you like to write and I do, then it’s fun. But as Steve Yegge outlines in his post “You Should Write Blogs” it’s also good for you. I won’t repeat Steve’s points but I do want to add one point and that is a blog is a very easy way to keep track of little tidbits of information that could otherwise be hard to refind. As an example all my posts on mono, nginx, asp.net, and asp.net mvc were written purely because I did not want have to try and find the information again or try and dredge it up from my poor memory. Instead I wrote the posts and now whenever I need to reconfigure a server to use nginx with mono and asp.net I can easily find the information on my blog. In a way it’s an extension of your memory, it’s a permanent easily searchable record of what you have done.

In that spirit I’m linking to Patrick Swieskowski and Sascha Kuzins’ post, How we got 18,000 beta users in 4 weeks. Now when I need to look up how to get beta users I can simply search my blog and find this article.

Statistically Accurate Ratings

In a previous post on ratings I noted some issues with using the mean vs the media for the rating. A few days ago Jeff Atwood posted on user ratings and specifically how to sort a set of items that are rated by users. Jeff’s post included material from Evan Miller’s article, How Not To Sort By Average Rating, on the same subject.

Below is the relevant portion of Evan Miller’s article:

CORRECT SOLUTION: Score = Lower bound of Wilson score confidence interval for a Bernoulli parameter

Say what: We need to balance the proportion of positive ratings with the uncertainty of a small number of observations. Fortunately, the math for this was worked out in 1927 by Edwin B. Wilson. What we want to ask is: Given the ratings I have, there is a 95% chance that the “real” fraction of positive ratings is at least what? Wilson gives the answer. Considering only positive and negative ratings (i.e. not a 5-star scale), the lower bound on the proportion of positive ratings is given by:

wilson confidence interval formula

(For a lower bound use minus where it says plus/minus.) Here p is the observed fraction of positive ratings, zα/2 is the (1-α/2) quantile of the standard normal distribution, and n is the total number of ratings. The same formula implemented in Ruby:

require ’statistics2′

def ci_lower_bound(pos, n, power)
if n == 0
return 0
end
z = Statistics2.pnormaldist(1-power/2)
phat = 1.0*pos/n
(phat + z*z/(2*n) – z * Math.sqrt((phat*(1-phat)+z*z/(4*n))/n))/(1+z*z/n)
end

pos is the number of positive rating, n is the total number of ratings, and power refers to the statistical power: pick 0.10 to have a 95% chance that your lower bound is correct, 0.05 to have a 97.5% chance, etc.

Now for any item that has a bunch of positive and negative ratings, use that function to arrive at a score appropriate for sorting on, and be confident that you are using a good algorithm for doing so.

Sadly everybody has simply quoted the mathematical forumla and not even given links to the material on how to derive the formula. I was able to find an article by Keith Dunnigan here that gave an outline of how it is derived along with several other confidence intervals. Hopefully later I can take a look at my textbooks and do a full derivation.

October 22, 2009

How To Auto-login Using ASP.Net FormsAuthentication

Filed under: Uncategorized — Bryan Bell @ 5:57 pm
Tags: , ,

For our demo website, demo.quickpm.net, I need to have a default automatic login. Using the ASP.Net Membership and FormsAuthentication an easy way to do auto-login is the following code

string userEmail = "john.doe@gmail.com";
string password = "thepassword";
if (Membership.ValidateUser(userEmail, password))
{
      FormsAuthentication.SetAuthCookie(userEmail, true);
      Response.Redirect("~/");
}

The above code assumes that “john.doe@gmail.com” is a user and “thepassword” is their password.

October 17, 2009

Mono Asp.Net MVC and Nginx

Filed under: Uncategorized — Bryan Bell @ 10:11 am
Tags: , , ,

I’ve been using Mono ASP.Net for a while with Nginx as the server. I recently started playing around with ASP.Net MVC and discovered that it doesn’t play nice when using Nginx as the server. To make ASP.Net work with Mono and Nginx you need to download the Mono source from here. And then modify the file mcs/class/System.Web.Routing/System.Web.Routing/Route.cs
in particular comment out the following lines in the GetRouteData function

if (pathInfo != String.Empty)
throw new NotImplementedException();

If you comment out the above two lines your MVC website should work just fine when hosting it using Mono and Nginx. I don’t know if there are any negative consequences for commenting out the above two lines but I have not encountered any.

August 30, 2009

Quick Heuristic For Tangram Polygon Intersection

Filed under: Computational Geometry — Bryan Bell @ 8:39 am
Tags: , , , ,

For the combinatorial approach to solving Tangram puzzles one frequent operation is to test if a puzzle piece, p, is contained in the silhouette s. Before this test is made the puzzle piece, p, is translated to a vertex, v, of s and one of the edges of p is aligned with an edge, e, coming out of vertex v.

For clarity I’ve included a figure below illustrating this.

puzzle piece and silhouette

In the above figure it’s obvious that the puzzle piece, p, does not fit in the silhouette. One way to see this computationally is to look at the edge, e, and notice that at vertex, v_1, the next edge of the silhouette, s, makes a left turn with respect to the edge e. This means that if the length of e is less than the length of all the edges in p then there is no possible way to fit p in s. That is as long as we have the condition that one of the edges of p is aligned with the edge e of s.

The above statement provides a quick way to skip vertices of the silhouette s when trying to fit the puzzle piece p in the silhouette s. Of course before skipping the vertex the same test needs to be done on the other edge coming out of v. Also we note that the above test only works if the next edge coming out of v_1 makes a left turn with respect to the edge e, otherwise the full test for polygon intersection has to be run. Also if one of the edges of the puzzle piece p has length less than or equal to the length of e then the full test for polygon intersection has to be run.

In summary this is a very specific heuristic that in general isn’t very useful but if you ever find yourself needing it, it can be just what the doctor ordered.

August 20, 2009

A Heuristic Solution to the Tangram Puzzle

Filed under: Computational Geometry — Bryan Bell @ 9:44 pm
Tags: , , ,

In this post I present a short summary of the paper “A Heuristic Solution to the Tangram Puzzle” by E.S. Deutsch and K. C. Hayes Jr. I’ve uploaded a scanned copy of the paper to scribd here. Uploading the scanned copy probably violates some copyright law so if they or anyone else contacts me I’ll remove the document. Ironically it’s taken me over two-years since my post here to finally read the paper.

In their paper they talk about two possible approaches to solving tangrams. One is the combinatorial approach that I’ve taken and the other is the heuristic method that they describe in their paper. The heuristic method that they use is a little clever, what they do is take the outline of the tangram puzzle and then try to break it out into sub-puzzles. The following figure illustrates where two puzzles have both been separated out into sub-puzzles. In both cases one of the sub-puzzles exactly matches a tangram piece thus allowing the algorithm to solve that sub-puzzle.

Puzzle Illustration

The dashed lines in the figure show where the algorithm is splitting the sub-puzzles out from the original puzzle. First note that the dashed lines are called “extension lines” in the paper. They are generated to allow the extraction of the sub-puzzles. To create the extension lines the algorithm goes along the outline of the puzzle and at convex corners it generates an “extension line” that allows the possible separation of the puzzle into sub-puzzles. The following figure illustrates this. In the paper they outline various ways for eliminating extension lines that are unlikely to be of use.

Extension Lines

Once the above extension lines are created there is a set of heuristics for splitting out sub-puzzles, the heuristics are used in the following order.


1. direct-match rule

The algorithm attempts to locate puzzle pieces fully described by edges, rather than by extension lines or by combinations of edges and extension lines.

2. 2 1/2 – 3 1/2 edge-match rule
The direct-match rule while being very reliable in finding matches cannot be used in most cases this means that a more relaxed rule is needed. The 2 1/2 – 3 1/2 edge-match rule allows the location of puzzle pieces where most of the periphery is described by edges and a small portion is described by extension lines. Specifically the “2 1/2 – 3 1/2 edge-match rules requires that in the case of triangular shapes two complete sides must be defined by edges and the remaining side can be defined by a combination of collinear edges and extension lines. Moreover, the combination must include at least one portion of edge. For four sided puzzle pieces the rule requires that the additional side also be fully described by an edge.” [1]

There are an additional 8 extraction rules that are used but the above two give you their general flavor.

The organization of the extraction rules is to run them recursively and at each step try to apply the most specific one possible first. If the algorithm gets to a step where there are no possible extractions it backtracks and tries another extraction rule. This process is not guaranteed to find a solution and in fact the paper gives an example of a puzzle on page 239 for which no solution is found.

The paper is a niece example of where a little clever thinking and some very specific heuristics can solve tangram puzzles.

References

[1] Deutsch, E. S., and K. C. Hayes Jr. “A Heuristic Solution to the Tangram Puzzle”, Machine Intelligence vol. 7, 1972, p. 205–240.

June 4, 2009

Convex Hulls

The algorithm I’m going to showcase is taken from the book “Computational Geometry algorithms and applications” by Berg, Cheong, Kreveld, and Overmars. They give an excellent analysis of the algorithm in the book so I’ll only go over the highlights and show my implementation in Javascript using the html5 canvas.

The problem statement is given a set of points, S = \{p_1, p_2, \ldots, p_n\} in R^2 compute the convex hull of S. The input of the algorithm is the set, S, of points and the output of the algorithm is the set of points that are the vertices of the convex hull of S.

Our algorithm will use a standard design technique to generate what’s called an incremental algorithm. That is we compute the solution for the first p_1, \ldots, p_i points then add the point p_{i+1} and compute the new solution using the previous solution.

Algorithm ConvexHull(P) (the pseudocode is borrowed from “Computational Geometry algorithms and applications”)
Input. A set P of points in the plane
Output. A list containing the vertices of \mathcal{CH}(P) in clockwise order.
1. Sort the points by x-coordinate, resulting in a sequence p_1,\ldots, p_n.
2. Put the points p_1 and p_2 in a list \mathcal{L}_\textrm{upper}, with p_1 as the first point.
3. for i \leftarrow 3 to n
4.        do Append p_i to \mathcal{L}_\textrm{upper}.
5.                        while \mathcal{L}_\textrm{upper} contains more than two points and the last three points in \mathcal{L}_\textrm{upper} do not make a right turn
6.                           do Delete the middle of the last three points from \mathcal{L}_\textrm{upper}.
7. Put the points p_n and p_{n-1} in a list \mathcal{L}_\textrm{lower}, with p_n as the first point.
8. for i \leftarrow n -2 downto 1
9.           do Append p_i to \mathcal{L}_\textrm{lower}.
10.                     while \mathcal{L}_\textrm{lower} contains more than 2 points and the last three points in \mathcal{L}_\textrm{lower} do not make a right turn
11.                     do Delete the middle of the last three points from \mathcal{L}_\textrm{lower}.
12. Remove the first and last point from \mathcal{L}_\textrm{lower}. to ovoid duplication of the points where the upper and lower hull meet.
13. Append \mathcal{L}_\textrm{lower} to \mathcal{L}_\textrm{upper}, and call the resulting list \mathcal{L}.
14. return \mathcal{L}.

You can view an implementation using Javascript and the html5 canvas element, here. To add a new point simply click the left mouse button and it’ll add a point where the mouse is.

June 3, 2009

Right and Left Turns

In my previous post on line segment intersection I introduced the two dimensional cross product as v1 \times v2 = {v1}_x \cdot {v2}_y - {v2}_x \cdot {v1}_y. The cross product can also be used to determine if a set of three points p_1, p_2, \textrm{ and } p_3 make a right turn.

First note that if v_1 \times v_2 > 0 then the angle between v_1 and v_2 is strictly less than \pi. For example the two vectors (1, 0) \times (0, 1) = 1 \cdot 1 - 0 = 1 > 0. Similarly if v_1 \times v_2 < 0 then the angle between them is strictly greater than \pi. In the below image I've shown an example where the three points p_1, p_2, \textrm{ and } p_3 make a right turn.

Points p1, p2, p3 making a right turn

To mathematically determine if the points make a right turn we let v_1 = p_1 - p_2 \textrm{ and } v_2 = p_3 - p_2. Then taking the cross product of v_1 \textrm{ and } v_2, we have v_1 \times v_2 > 0 which implies that the angle between them is less than \pi that is they make a right turn.

This method for determining whether the points make a right or left turn is very useful for determining the convex hull of a set of points.

I’ve posted an example webpage, here, where using javascript the lines change color depending on whether the turn is to the right or the left. Please note that the page uses the html5 canvas element so it will not work in internet explorer 7, or 6.

May 29, 2009

Line Segment Intersection

We want to determine if two line segments s_1 and s_2 intersect and if they do intersect, the point of intersection. First we want to analyze the problem. There is an edge case that is difficult to analyze (case 5 in the below list).

  1. The segments are not parallel (easy)
  2. The segments are parallel and do not intersect (easy)
  3. The segments are collinear and do not intersect (easy)
  4. The segments are collinear and do intersect (easy)
  5. The segments are nearly parallel or nearly collinear (hard)

In this post I handle cases 1 through 4. I’ll leave the handling of case 5 for later posts.

For the problem setup let segment 1 be given by

s_1 = p + t \cdot r where 0 <= t <= 1 and p, r are 2d vectors, similarly let segment 2 be given by

s_2 = q + u \cdot s where 0 <= u <= 1 and q, s are 2d vectors.

This representation of the line segments yields a very natural method for computing their intersection.

Computing cross products is the heart of the algorithm for determining intersections. The two dimensional cross product of p_1 \times p_2 is given by

\textrm{det} \left( \begin{array} {cc} x_1 & x_2 \\ y_1 & y_2\end{array} \right) = x_1y_2 - x_2y_1 = -p_2 \times p_1

If the cross product is zero then the two vectors are collinear that is pointing in either same direction or opposite directions.

The two lines intersect when p + t  \cdot r = q + u \cdot s, by crossing both sides with s we get

(p + t \cdot r) \times s = (q + u \cdot s) \times s = q \times s + u \cdot s \times s = q \times s.

From this we solve for t obtaining

t = \frac{(q - p) \times s}{r \times s}.

We can similarly solve for u, obtaining

u = \frac{(q -p) \times r}{r \times s}.

If both t and u are between 0 and 1, then the two line segments intersect and the intersection point is given by p + t \cdot r, where the value of t is the one we solved for. If either t or u are not between 0 and 1 then the line segments do not intersect.

But suppose r \times s = 0, then we can’t solve for t and u. This is because r \times s = 0 means that r and s are parallel which means the two line segments are parallel. Please also note that r \times s = 0 implies that r and s are scalar multiplies of each other. If (p -q) \times r \not= 0 then the segments are parallel but not collinear hence they do not intersect. If (p -q) \times r = 0 then the segments are collinear and we can simply project both of them onto to x-axis and determine if their projections intersect.

For case 5 when r \times s is close to zero the analysis needs to be well thought out and I’ll leave it to future posts.

I’ve posted a sample implementation in Javascript using the html5 canvas at http://cloud.github.com/downloads/bjwbell/canvas-geolib/main.html (please note it won’t work in internet explorer since it doesn’t support the canvas element).

Credit Gareth Rees for his post at stackoverflow.com which is based on the method for 3D line intersection algorithm from the article “Intersection of two lines in three-space” by Ronald Graham, published in Graphics Gems, page 304.

May 18, 2009

Book Memes

Filed under: Uncategorized — Bryan Bell @ 10:53 pm
Tags: , , , , , ,

Book memes
* Grab the nearest book.
* Open it to page 56.
* Find the fifth sentence.
* Post the text of the sentence in your journal along with these instructions.
* Don’t dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

“Fibonacci numbers are related to the golden ratio phi and to its conjugate, which are given by the following formulas: phi = (1+sqrt(5))/2, conjugate phi = (1 – sqrt(5))/2″

I’m way too geeky :) most of my books are mathematics and computer science.

Next Page »

Blog at WordPress.com.