Hi Friends,
Okay let’s be honest, “wrap around” sounds dubious at best. Just saying it conjures images of food items thickely coated in cellophane. Maybe its just me. In this post, I wanna get back to this KD-Tree and what was so goddamn hard about making it work with for our chemical Monte Carlo. Check out this page, for a quick refresher for why we want to use this data structure.
The Intro:
So lots of scientists use lots of simulations. One puts a few “basic” rules into the ‘puter and lets it crunch numbers and then sees how well the end results match observations. If it doesn’t match the observations then your input rules are probably wrong. If it does match the observations….hahaha well technically speaking you still don’t know much. C’est la science.
Anyway, our simulation involves particles in a box. Why a box? Because finite volumes are much easier to simulate than infinite volumes. And, what happens when a particle hits the “edge” of the box? Sometimes you want the particles to “bounce” off that edge like tennis ball off of asphalt. Other times, and this includes our chemical problem, you want the particles to not see the wall at all. Now thats quite the trick, right? When a particle gets all the way to the left edge, and keeps going left, you just put it on the other side of the box (ie the right side) and let it keep moving left! That’s wrap, baby!
Wrapping at the edges makes our system look like a small subsample out of “the bulk”–as if you had a microscope that could see a 10x10x10 lattice of iron atoms inside a giant metal block (which has 10^crazy_big_number total atoms)
The Problem:
Alright, well that was a … long explanation. So what? Well wrap breaks a fundemental assumption of the KD-Tree search algorithm. In fact its worse than that, almost all divide-and-conquer algorithms would break with wrap because they depend on the ::cue dramatic timpani::
THE TRIANGLE INEQUALITY!!
Dun dun dun dun dun…. Yea, I’m a math-drama-queen. Can’t stop; won’t stop. Lemme try give you a cute example, and check out Wikipedia for an in-depth explanation.
Imagine you’re looking at three particles standing in a line called: Josh, Ray and Monica. If Josh is to the left of Ray who is to the left of Monica, then Josh is obviously further away from Monica than from Ray, right?
You’d think. And you’d be right if the line were just like the real number line: it extends from -infinity to infinity. But what if the line is more like a circle: go two meters to the left and you’ve gone all the way around and ended up back where you started. Ru roh.
Let’s use two smart-person words. Our simulated space, which is like a circle in three dimensions, is non-metrizable, but it is a manifold. Phew…uhhhh…say what?? Well the non-metrizability arises from the loss of the triangle inequality, and implies that divide-and-conquer algorithms need to be modified. We are saved by our space being a manifold, because we can always pick a small neighborhood of around a point of interest and map it onto n-dimensional Euclidean space (which does have the triangle inequality).
The Solution:
The problem is easy to fix in 1D, so why don’t we start there? Lets say you’re looking for the nearest neighbor to x1=0.1, run a standard vanilla KD-Tree search. If the nearest neighbor is @ x2=0.1004 you’re done, because even if there was a point x3=0.99999, x3 is still too far away. But if the nearest neighbor to x1 is at x4=0.3 then you need to search the other side of the box. So you do two searches, one for all points close to x1=0.1 and one for all points close to x=1.0 (the far side). Far out, man.
Does that make any sense? I’ll assume it does, since this is more of a monologue anyway. Now lets move to k-dimensions, where shit gets really complicated. Imagine repeating our above procedure in in 2-d, i.e. on a square. Instead of 1 boundary you have to solve 8. Why 8? Because corners have to be treated differently than sides. No, geometry doesn’t want you to sleep on Saturday nights. And a cube, our actual system, has 26 edges, faces and corners.
Crap, coding up 26 cases sounds miserable. Besides, even if I did write down all 26 cases, I’d still be stuck with exactly 3-dimensions. Now, we know that the critical question is, “do I need to search that far part of the box too?”, how do we ask for each “far part”. Sounds pretty easy, right? I found this to be a surprisingly tricky math problem. Lemme state it summarily:
Write a concise snippet of C, that enumerates all the faces, edges and corners of a k-dimensional cube. And by concise, I mean finite and unchanging with k. If you had k loops then its easy, but I haven’t told you (even at compile time), what k is.
Leave questions/comments below! Thanks for reading folks!
Leave a Reply