Jekyll2019-04-29T21:45:36+00:00https://aseemrb.me/feed.xmlAseem Raj BaranwalPersonal webpageLunch with Donald Knuth2018-10-31T00:00:00+00:002018-10-31T00:00:00+00:00https://aseemrb.me/lunch-with-knuth<p>On Halloween, <a href="https://www-cs-faculty.stanford.edu/~knuth/">Donald E. Knuth</a> (<a href="https://en.wikipedia.org/wiki/Donald_Knuth">wiki</a>)
visited the University of Waterloo for a
<a href="https://uwaterloo.ca/computer-science/events/dls-donald-knuth-all-questions-answered">distinguished lecture</a>,
and thanks to my advisor <a href="https://cs.uwaterloo.ca/~shallit/">Prof. Jeff Shallit</a>, who arranged a lunch with him
for a smaller group of people, which is how I could meet him in person. My first thought when he appeared was
<em>‘he is quite taller than he looks in the photos’</em>.</p>
<p>Whenever Knuth gives a talk, almost all of it is interactive and Q/A based. He takes <em>all kinds of</em> questions
(except politics and religion) from the audience. I could only appreciate all the wisdom his answers had. Almost every opinion
was accompanied by an anecdote from his own life experience. Here’s a set of things that were
brought up during the lunch and the talk (unfortunately, I forgot most of the conversations).<br />
<em>The public talk is now available <a href="https://youtu.be/XWR5Y3Wf8Fo">on youtube</a>.</em></p>
<h3 id="on-p-vs-np">On P vs NP</h3>
<p>Knuth believes that <a href="https://en.wikipedia.org/wiki/P_versus_NP_problem">P = NP</a>. This obviously needs an explanation, and I will
try to present the reason based on my understanding of his argument. First let’s be clear on
<strong>what it means when we say <em>P = NP</em></strong>. It means that there exists an integer <script type="math/tex">k</script> and an algorithm <script type="math/tex">A</script> which
solves every problem in the class <em>NP</em> of size <script type="math/tex">m</script> bits in <script type="math/tex">m^k</script> elementary steps. Now Knuth has two points:</p>
<ul>
<li>Imagine a number <script type="math/tex">k</script> which is finite but excessively humongous (to have a sense of humongous numbers, one might look at
<a href="https://en.wikipedia.org/wiki/Ackermann_function">Ackermann Function</a> or even
<a href="https://en.wikipedia.org/wiki/Graham%27s_number">Graham’s Number</a>). But irrespective of how big a finite number is, it is
always 0% as big as infinity. Now there exist an incredibly huge number of algorithms that do <script type="math/tex">m^k</script> elementary operations
on the <script type="math/tex">m</script> bits and it is extremely hard to believe that none of those algorithms can do what we want.</li>
<li>The resolution of the <em>P vs NP</em> problem will not be a helpful result because the proof almost certainly will be
<a href="https://en.wikipedia.org/wiki/Constructive_proof#Non-constructive_proofs"><strong>non-constructive</strong></a>. Mathematics has a lot
of examples where we have proof for the existence of something, but that proof does not help us <em>find</em> the actual thing
<em>(I’m thinking cryptography here)</em>. So proving the existence of <script type="math/tex">A</script> is different from actually finding <script type="math/tex">A</script>.</li>
</ul>
<h3 id="honeymoon-advice">Honeymoon advice</h3>
<p>On his honeymoon in 1961, Knuth was reading <a href="https://en.wikipedia.org/wiki/Noam_Chomsky">Noam Chomsky</a>’s book
<strong><em>Syntactic Structures</em></strong>, and he thinks that was a bad idea (reading it <em>on the honeymoon</em>, not reading it altogether).
Although, it was while reading this book that he discovered an <strong>intersection between <em>Mathematics</em> and <em>Computer Programming</em></strong>
(compiler design).</p>
<h3 id="boredom-of-the-young-generation">Boredom of the young generation</h3>
<p>Knuth is appalled by statements from people of the younger generation that go along the lines of:<br />
<strong>“X is quite boring, which is why I study / work on Y instead.”</strong><br />
He says that it’s not the job of the world to entertain you. Boredom is inside you, not in the material
that you work with. It is true that some people find certain things more interesting than others, and hence are more
curious about it. The right thing to say would be that you are more curious about <em>Y</em> than <em>X</em>.</p>
<h3 id="on-recent-areas-of-interest">On recent areas of interest</h3>
<p>As Knuth says, he was a mathematician who got curious about Computer Science, but now his interests are again
inclined toward pure Mathematics. The problem space concerned with <a href="https://en.wikipedia.org/wiki/Family_of_sets">families of sets</a>
seems very interesting to him currently because we haven’t still found a lot of ways to represent and work with them in a way
that might help us analyze the numerous applications covered by this construct. I am not sure, but perhaps he started
thinking in this direction when the data structure <a href="https://en.wikipedia.org/wiki/Zero-suppressed_decision_diagram#Representing_a_family_of_sets">Zero-suppressed decision diagram (ZDD)</a>
(given by <strong><em>Shin-ichi Minato</em></strong>) came to light, which in Knuth’s words is
<strong>“the best way that I know of to represent families of sets”</strong>.</p>On Halloween, Donald E. Knuth (wiki) visited the University of Waterloo for a distinguished lecture, and thanks to my advisor Prof. Jeff Shallit, who arranged a lunch with him for a smaller group of people, which is how I could meet him in person. My first thought when he appeared was ‘he is quite taller than he looks in the photos’.Explaining the Doppler effect to my mom2018-05-29T00:00:00+00:002018-05-29T00:00:00+00:00https://aseemrb.me/doppler-effect<p>Some time back I was at home and was reading through the <em><a href="https://archive.org/details/TheClassicalTheoryOfFields">L.D. Landau & E.M. Lifshitz The Classical Theory of Fields</a></em> where I came across <em>relativistic Doppler effect</em> (which is the classical Doppler effect with laws of special relativity taken into consideration). Mom was sitting beside me when she suddenly peeked at what I was reading and asked <em>“What is the Doppler effect?”</em>.</p>
<p>I remember when I first studied the Doppler effect in secondary school and my physics teacher presented it as just an equation to be memorized. On asking where it came from I was told that it is beyond the scope of current syllabus and that I will learn it later during higher studies. Because I hate memorizing formulas and equations, this bugged me. The school library did not have any physics books above the level of our syllabus scope as the school was only up to the secondary level, but I could find some applications of Doppler effect listed in one of the books:</p>
<ul>
<li>Used in some types of <a href="https://en.wikipedia.org/wiki/Doppler_radar">radars</a> to measure velocities of discovered objects</li>
<li>Used by astronomers to compute the velocities at which stars and galaxies move relative to each other (<a href="https://en.wikipedia.org/wiki/Redshift">redshift</a>/<a href="https://en.wikipedia.org/wiki/Blueshift">blueshift</a>)</li>
<li>Used for taking a surface’s <a href="https://en.wikipedia.org/wiki/Laser_Doppler_vibrometer">vibration measurements</a></li>
</ul>
<p>It was only quite later that I realized why the effect exists. So I wanted my mom to realize it in the first go. To do that, I did not introduce any equations or tell her about any applications whatsoever. She is not familiar with terms like <strong><em>frequency</em></strong>, <strong><em>relative motion</em></strong>, <em>etc.</em> so I used a simple visual analogy of a person throwing balls towards another person.</p>
<hr />
<h3 id="the-analogy">The analogy</h3>
<p>Two friends Alice and Bob are playing with a ball. Alice is throwing balls straight towards Bob with a <strong>constant velocity</strong> and at a <strong>constant rate</strong> <em>(I had to tell my mom that the rate at which something happens is fancily called the frequency, so practically in this case frequency is the answer to the question <strong>“How many in a second?”</strong>).</em> Now let’s take the following setup and known things where <strong>all speeds are in m/s, time is in seconds and distance is in meters</strong> and we shall derive the classical Doppler effect from there:</p>
<ul>
<li>The speed at which Alice throws the ball = <script type="math/tex">c</script> meters/s</li>
<li>The rate of throwing the balls = <script type="math/tex">f</script> balls per second (frequency f Hz)</li>
<li>Time instant at which Alice throws the first ball: <script type="math/tex">t = 0s</script></li>
<li>Initial displacement between Alice and Bob = <script type="math/tex">d</script> meters</li>
<li>If Alice throws <script type="math/tex">f</script> balls in one second, The time gap between two consecutive throws is <script type="math/tex">1/f</script> seconds</li>
<li>Speed = Distance covered per unit time</li>
</ul>
<p>Initially both Alice and Bob are <strong>not moving</strong>. <em>What is the rate at which Bob receives the balls?</em> To see this arithmetically:</p>
<table>
<thead>
<tr>
<th><strong>Time instant (seconds)</strong></th>
<th><strong>Event</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td><script type="math/tex">t_1 = 0</script></td>
<td>Alice throws the first ball</td>
</tr>
<tr>
<td><strong><script type="math/tex">t_2 = \frac{d}{c}</script></strong></td>
<td><strong>Bob receives the first ball</strong></td>
</tr>
<tr>
<td><script type="math/tex">t_3 = \frac{1}{f}</script></td>
<td>Alice throws the second ball</td>
</tr>
<tr>
<td><strong><script type="math/tex">t_4 = \frac{1}{f} + \frac{d}{c}</script></strong></td>
<td><strong>Bob receives the second ball</strong></td>
</tr>
</tbody>
</table>
<p>The time gap between two consecutive receive events (<script type="math/tex">t_4 - t_2</script>) is <script type="math/tex">\frac{1}{f}</script> seconds which is the same as the time gap between two consecutive send events. Hence the rate at which Alice throws and the rate at which Bob receives are the same which is <strong><script type="math/tex">f</script> balls per second</strong>.</p>
<p><strong><em><u>Now what happens when things start moving?</u></em></strong>
<br />
When I asked this question to my mom, her face immediately lit up. Intuition kicked in and she blurted out <em>“If Alice is moving closer to Bob then Bob will receive the balls at a faster rate than Alice is throwing them.”</em> My job was done here, she was happy and convinced about how and why Doppler effect works but I went ahead to tell her about relative velocities, applications of the Doppler effect in astronomy, <em>etc.</em> and how when whistling trains approach us we get a higher pitch sound while a lower pitch sound when they are moving away from us, so essentially the brain also applies the Doppler effect to know about approaching and receding objects.</p>
<hr />
<h3 id="closure">Closure</h3>
<p>For closure let’s look at what happens when</p>
<ul>
<li><strong>Alice (in red) is moving towards Bob (in blue) with speed <code class="highlighter-rouge">a m/s</code></strong> and</li>
<li><strong>Bob is moving away from Alice with speed <code class="highlighter-rouge">b m/s</code></strong>.</li>
</ul>
<svg height="138" width="490">
<defs>
<marker id="triangle" viewBox="0 0 14 14" refX="0" refY="5" markerUnits="strokeWidth" markerWidth="10" markerHeight="10" orient="auto">
<path d="M 0 0 L 10 5 L 0 10 z" />
</marker>
</defs>
<path d="M 24 8 A 16 16 0 0 0 12 16" stroke="red" stroke-width="2" fill="transparent" />
<path d="M 36 16 A 16 16 0 0 0 24 8" stroke="red" stroke-width="2" fill="transparent" />
<path d="M 400 8 A 16 16 0 0 0 388 16" stroke="blue" stroke-width="2" fill="transparent" />
<path d="M 412 16 A 16 16 0 0 0 400 8" stroke="blue" stroke-width="2" fill="transparent" />
<path d="M 12 16 A 16 16 0 0 0 12 32" stroke="red" stroke-width="2" fill="transparent" />
<path d="M 36 32 A 16 16 0 0 0 36 16" stroke="red" stroke-width="2" fill="transparent" />
<path d="M 388 16 A 16 16 0 0 0 388 32" stroke="blue" stroke-width="2" fill="transparent" />
<path d="M 412 32 A 16 16 0 0 0 412 16" stroke="blue" stroke-width="2" fill="transparent" />
<path d="M 12 32 A 16 16 0 0 0 24 40" stroke="red" stroke-width="2" fill="transparent" />
<path d="M 24 40 A 16 16 0 0 0 36 32" stroke="red" stroke-width="2" fill="transparent" />
<path d="M 90 40 A 5 5 0 0 0 85 45" stroke="gray" stroke-width="10" fill="transparent"></path>
<path d="M 94 45 A 5 5 0 0 0 89 40" stroke="gray" stroke-width="10" fill="transparent" />
<path d="M 85 44 A 5 5 0 0 0 90 49" stroke="gray" stroke-width="10" fill="transparent" />
<path d="M 89 49 A 5 5 0 0 0 94 44" stroke="gray" stroke-width="10" fill="transparent" />
<text x="110" y="44" style="font-size:14px;font-family:monospace">c m/s</text>
<path d="M 388 32 A 16 16 0 0 0 400 40" stroke="blue" stroke-width="2" fill="transparent" />
<path d="M 400 40 A 16 16 0 0 0 404 48" stroke="blue" stroke-width="2" fill="transparent" />
<path d="M 400 48 A 16 16 0 0 0 408 48" stroke="blue" stroke-width="2" fill="transparent" />
<path d="M 400 40 A 16 16 0 0 0 412 32" stroke="blue" stroke-width="2" fill="transparent" />
<line x1="16" x2="24" y1="64" y2="48" stroke="rgb(250,0,0)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<path d="M 24 48 A 16 16 0 0 0 32 48" stroke="red" stroke-width="2" fill="transparent" />
<path d="M 24 40 A 16 16 0 0 0 28 48" stroke="red" stroke-width="2" fill="transparent" />
<line x1="28" x2="28" y1="48" y2="64" stroke="rgb(250,0,0)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="32" x2="40" y1="48" y2="64" stroke="rgb(250,0,0)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="112" x2="120" y1="56" y2="56" stroke="rgb(0,0,0)" stroke-width="1" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="120" x2="128" y1="56" y2="56" stroke="rgb(0,0,0)" stroke-width="1" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="128" x2="132" y1="56" y2="56" style="stroke: rgb(0,0,0);stroke-width:1" marker-end="url(#triangle)" />
<line x1="392" x2="400" y1="64" y2="48" stroke="rgb(0,0,250)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="404" x2="404" y1="48" y2="64" stroke="rgb(0,0,250)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="408" x2="416" y1="48" y2="64" stroke="rgb(0,0,250)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<path d="M 28 68 A 16 16 0 0 0 30 76" stroke="red" stroke-width="2" fill="transparent" />
<line x1="28" x2="28" y1="64" y2="68" stroke="rgb(250,0,0)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="32" x2="30" y1="80" y2="76" stroke="rgb(250,0,0)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<text x="200" y="76" style="font-size:14px;font-family:monospace">d meters</text>
<path d="M 402 76 A 16 16 0 0 0 404 68" stroke="blue" stroke-width="2" fill="transparent" />
<line x1="404" x2="404" y1="64" y2="68" stroke="rgb(0,0,250)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="400" x2="402" y1="80" y2="76" stroke="rgb(0,0,250)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="24" x2="32" y1="96" y2="80" stroke="rgb(250,0,0)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="32" x2="40" y1="80" y2="96" stroke="rgb(250,0,0)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="56" x2="52" y1="88" y2="88" style="stroke: rgb(0,0,0);stroke-width:1" marker-end="url(#triangle)"></line>
<line x1="52" x2="378" y1="88" y2="88" stroke="rgb(0,0,0)" stroke-width="1" stroke-linecap="round" stroke-linejoin="mitter"></line>
<line x1="378" x2="382" y1="88" y2="88" style="stroke: rgb(0,0,0);stroke-width:1" marker-end="url(#triangle)"></line>
<line x1="392" x2="400" y1="96" y2="80" stroke="rgb(0,0,250)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="400" x2="408" y1="80" y2="96" stroke="rgb(0,0,250)" stroke-width="2" stroke-linecap="round" stroke-linejoin="mitter" />
<text y="124" style="font-size:14px;font-family:monospace">a m/s</text>
<line x1="48" x2="56" y1="120" y2="120" stroke="rgb(0,0,0)" stroke-width="1" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="56" x2="64" y1="120" y2="120" stroke="rgb(0,0,0)" stroke-width="1" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="64" x2="68" y1="120" y2="120" style="stroke: rgb(0,0,0);stroke-width:1" marker-end="url(#triangle)" />
<text x="376" y="124" style="font-size:14px;font-family:monospace">b m/s</text>
<line x1="426" x2="434" y1="120" y2="120" stroke="rgb(0,0,0)" stroke-width="1" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="434" x2="442" y1="120" y2="120" stroke="rgb(0,0,0)" stroke-width="1" stroke-linecap="round" stroke-linejoin="mitter" />
<line x1="442" x2="446" y1="120" y2="120" style="stroke: rgb(0,0,0);stroke-width:1" marker-end="url(#triangle)" />
</svg>
<h4 id="observations">Observations</h4>
<ul>
<li>If Alice throws the first ball at <script type="math/tex">t_1 = 0</script>, then to compute the time instant at which the ball reached Bob we also need to consider how much farther Bob went during the process. Hence if Bob receives the first ball at <script type="math/tex">t_2</script>, then:<br />
<script type="math/tex">ct_2 = d + bt_2</script> … where <script type="math/tex">bt_2</script> is the extra distance covered by Bob</li>
<li>When Alice throws the second ball, she has already moved some distance which needs to be taken into account.</li>
</ul>
<p>Now the events table looks like this (work it out yourself if you want)</p>
<table>
<thead>
<tr>
<th><strong>Time instant (seconds)</strong></th>
<th><strong>Event</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td><script type="math/tex">t_1 = 0</script></td>
<td>Alice throws the first ball</td>
</tr>
<tr>
<td><strong><script type="math/tex">t_2 = \frac{d}{c-b}</script></strong></td>
<td><strong>Bob receives the first ball</strong></td>
</tr>
<tr>
<td><script type="math/tex">t_3 = \frac{1}{f}</script></td>
<td>Alice throws the second ball</td>
</tr>
<tr>
<td><strong><script type="math/tex">t_4 = \frac{d}{c-b} + \frac{1}{f}\left(\frac{c-a}{c-b}\right)</script></strong></td>
<td><strong>Bob receives the second ball</strong></td>
</tr>
</tbody>
</table>
<p>Here the time gap between two consecutive receive events is <script type="math/tex">t_4 - t_2 = \frac{1}{f}\left(\frac{c-a}{c-b}\right)</script> seconds which is not the same as the time gap between two consecutive send events. Alice throws <script type="math/tex">f</script> balls per second. But Bob receives <script type="math/tex">f*\left(\frac{c-b}{c-a}\right)</script> balls per second.</p>
<p>As clearly depicted by the equation, if Alice’s velocity towards Bob increase then Bob will receive more balls per second, whereas if Bob’s speed away from Alice increases then Bob will receive fewer balls per second. The same thing happens with sound waves when we hear a moving train’s horn. We are Bob and the train is Alice.</p>
<hr />Some time back I was at home and was reading through the L.D. Landau & E.M. Lifshitz The Classical Theory of Fields where I came across relativistic Doppler effect (which is the classical Doppler effect with laws of special relativity taken into consideration). Mom was sitting beside me when she suddenly peeked at what I was reading and asked “What is the Doppler effect?”.How much will you pay to play this game?2017-08-26T00:00:00+00:002017-08-26T00:00:00+00:00https://aseemrb.me/st-petersburg-paradox<p>Recently while revisiting probability, I came across an interesting problem, otherwise commonly known as the <strong><em>St. Petersburg paradox</em></strong>. The problem is about a coin-toss game, in which you will always win. What matters is the amount of money you win in the game. To play it, you must pay an amount <strong><code class="highlighter-rouge">C</code></strong> first. Let’s see what the game is, and then we can think about the right value of <strong><code class="highlighter-rouge">C</code></strong>.</p>
<h3 id="the-game">The game</h3>
<p>This is a single player game where at each stage, a <a href="https://en.wikipedia.org/wiki/Fair_coin">fair coin</a> is tossed. The stakes start with <strong><code class="highlighter-rouge">$2</code></strong> on the table. If the first toss results in a heads, the player wins two dollars and the game ends, else if the toss lands a tails, the stakes are doubled, i.e. <strong><code class="highlighter-rouge">$4</code></strong>. So the stakes keep doubling while the toss keeps giving tails, and the game ends when it lands a heads, and the player takes away all the stakes as winnings at that point.</p>
<p>Clearly, the <strong>minimum amount</strong> you can win is <strong>$2</strong> (first toss gives a heads), while the <strong>maximum amount</strong> is <script type="math/tex">\infty</script> (no toss gives a head, then the last one is a head [hard to imagine I know]). Now let’s take the amount that one is willing to pay for playing this game be <script type="math/tex">C</script> and let’s take the amount one wins in this game (the stakes) be <script type="math/tex">W</script>. Considering that the player is completely rational, they would want to play the game only if <script type="math/tex">C \leq E(W)</script> where <script type="math/tex">E(W)</script> is the <strong>expected value of <script type="math/tex">W</script></strong>. The expected value is simply what we expect the value of <script type="math/tex">W</script> to be. This is a common notion in probability and statistics and is computed as follows:
<script type="math/tex">E(W) = \sum_{w}{w \times P(W = w)}</script> where <script type="math/tex">P(W = w)</script> is the probability that the player wins <script type="math/tex">w</script> dollars. Notice that this is a summation over all possible values of <script type="math/tex">w</script>, which are all powers of 2.</p>
<script type="math/tex; mode=display">E(W) = (\frac{1}{2} \times 2) + (\frac{1}{4} \times 4) + (\frac{1}{8} \times 8) + \space \ldots</script>
<p>This is because the probability that:</p>
<ul>
<li>the first flip is a head = <script type="math/tex">\frac{1}{2}</script></li>
<li>the first flip is a tail, and the second is a head = <script type="math/tex">\frac{1}{4}</script></li>
<li>the first two flips are tails, and the third is a head = <script type="math/tex">\frac{1}{8}</script></li>
<li>and so on …</li>
</ul>
<script type="math/tex; mode=display">E(W) = 1 + 1 + 1 + \space \ldots = \infty</script>
<h3 id="what-does-this-mean">What does this mean?</h3>
<p>According to what we have calculated here, <strong>a player should be willing to pay any amount <script type="math/tex">C</script> to play this game</strong>, because the <strong>expected amount of the winnings is <script type="math/tex">\infty</script></strong>.</p>
<p>But <strong>does your intuition agree with this result?</strong> I guess not. What should we trust then? This computation or our intuition? Personally, I would not be willing to pay more than $10 to play this game. So it seems we are missing something important here.</p>
<p>To find out what’s missing, consider a real situation, a real scenario where you are playing this game. It’s true that the winning amount increases exponentially with each toss, so we would want only tails. But notice that to win the game, we need a heads at last, which marks the finishing of the game, otherwise this can go on forever. Playing this game forever is futile because the player is stuck in an infinite game. In game theory, we would solve this paradox (between the computation and our intuition) by introducing the concept of <strong>utility</strong>, which takes into account everything that is important to the player, so time will be a concern and the game will become more practical, for example we will have an upper bound on the winnings, because surely one cannot have an infinite amount of money.</p>
<h3 id="getting-practical">Getting practical</h3>
<p>Consider a real scenario for this game-play. The game host will not be able to give you an <em>infinite amount of money</em>, and the game will not be able to continue for an <em>infinite amount of time</em>, so they will <strong>put an upper cap</strong>, say <strong>a billion dollars</strong>. So if there are <script type="math/tex">29</script> continuous flips resulting in tails, then you win a billion dollars, because <script type="math/tex">2^{30} > 1 \space billion</script>. So <strong>any game going up to 29 flips, all tails gives you a winning of 1 billion dollars</strong>. Let’s see how much we would want to pay to play this game now, given that <strong>it ends not only on getting a head now, but also if there are 29 tails</strong>.</p>
<p>In the previous hypothetical scenario we had <script type="math/tex">w \in \{2^k : k \in \mathbb{N}\}</script>, <em>i.e.</em> the winnings could be any power of 2. But in this practical case, we have <script type="math/tex">w \in \{2^k : k \in \{1, 2, ..., 29\}\} \cup \{1,000,000,000\}</script> as any game with 29 flips giving tails lets us win a billion dollars. Now let’s compute that expectation again, recalling that the expectation is the summation over all possible values that <script type="math/tex">w</script> can take.</p>
<script type="math/tex; mode=display">E(W) = (\frac{1}{2} \times 2) + (\frac{1}{4} \times 4) + (\frac{1}{8} \times 8) + \space \ldots + (\frac{1}{2^{29}} \times (2^{29} + 1,000,000,000)) \approx 30</script>
<p>Notice that drastic change, by our earlier computation, we should have been willing to pay an infinite amount to play this game. But when we involve the practical situation in our mathematics, we are not even willing to pay more than $30 even when the maximum winnings can be a billion dollars.</p>
<hr />Recently while revisiting probability, I came across an interesting problem, otherwise commonly known as the St. Petersburg paradox. The problem is about a coin-toss game, in which you will always win. What matters is the amount of money you win in the game. To play it, you must pay an amount C first. Let’s see what the game is, and then we can think about the right value of C.Common join algorithms2017-03-18T00:00:00+00:002017-03-18T00:00:00+00:00https://aseemrb.me/join-algorithms<p>Working in the <a href="https://docs.microsoft.com/en-us/azure/site-recovery/">Azure Disaster Recovery (ASR)</a> team for over 6 months now, I have been using <a href="https://docs.microsoft.com/en-us/connectors/kusto/">Kusto</a> (a log analytics platform developed at Microsoft) extensively for interactive analysis and monitoring of internal service components and flows.</p>
<p>Kusto is modeled in <strong>a typical RDBMS fashion</strong> and it supports complex analytical queries over stored entities. The <strong>ingestion and querying</strong> performance is topnotch, which comes at the cost of <strong>sacrificing</strong> the ability to carry out in-place updates of individual rows. In the Kusto documentation the best practices have mentioned that when doing joins (synonymous to joins in SQL terminology), always keep the computationally heavy part on the right side of the join for better performance. I used to wonder why this is recommended until I read about the most common join algorithms. Perhaps reading about how joins are implemented will help me write more efficient queries? This is a post about some very primitive join algorithms.</p>
<hr />
<h2 id="notation">Notation</h2>
<ul>
<li><script type="math/tex">N_x</script>: Number of records in relation X</li>
<li><script type="math/tex">B_x</script>: A block (partition) in relation X</li>
<li><script type="math/tex">P_x</script>: The number of blocks (partitions) of X</li>
<li><script type="math/tex">N_{B_x}</script>: Number of records in a block of relation X</li>
</ul>
<hr />
<h2 id="nested-loop-algorithm">Nested loop algorithm</h2>
<p>This is the trivial join algorithm with two nested loops. A brute force on all row-row combinations of both sides. For joining two relations <strong>X</strong> and <strong>Y</strong>, it runs in <strong><script type="math/tex">\mathcal{O}(N_xN_y)</script></strong> operations.</p>
<p>The following would be a crude pseudocode for this.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">for</span> <span class="p">(</span><span class="n">record</span> <span class="n">Rx</span> <span class="ow">in</span> <span class="n">X</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">record</span> <span class="n">Ry</span> <span class="ow">in</span> <span class="n">Y</span><span class="p">)</span>
<span class="p">{</span>
<span class="c"># Check if Rx and Ry satisfy the join condition.</span>
<span class="k">if</span> <span class="n">join_condition</span><span class="o">.</span><span class="n">satisfy</span><span class="p">(</span><span class="n">Rx</span><span class="p">,</span> <span class="n">Ry</span><span class="p">)</span>
<span class="p">{</span>
<span class="c"># Join the records and add to output</span>
<span class="k">return</span> <span class="p">(</span><span class="n">Rx</span><span class="p">,</span> <span class="n">Ry</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<hr />
<h2 id="hash-join-algorithm">Hash join algorithm</h2>
<p>In this algorithm one of the tables is <strong>loaded into memory and hashed on the joining key</strong>. Then while scanning the second table, the hashes are matched to verify the join condition. To judge if this is a better algorithm we need to consider all pros and cons of the algorithm. First let us look at the pseudocode. In the example below, an inner join is performed. The primary thing to consider is that the hash function has the <strong>join attributes as keys</strong> and the <strong>entire row as the value</strong>.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">HashTable</span> <span class="n">Ht</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">record</span> <span class="n">Rx</span> <span class="ow">in</span> <span class="n">X</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Compute</span> <span class="nb">hash</span> <span class="n">key</span> <span class="n">on</span> <span class="n">join</span> <span class="n">attribute</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="ow">in</span> <span class="n">Rx</span>
<span class="n">Insert</span> <span class="n">Rx</span> <span class="n">to</span> <span class="n">the</span> <span class="n">appropriate</span> <span class="n">bucket</span> <span class="ow">in</span> <span class="n">Ht</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="n">record</span> <span class="n">Ry</span> <span class="ow">in</span> <span class="n">Y</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Compute</span> <span class="nb">hash</span> <span class="n">key</span> <span class="n">on</span> <span class="n">join</span> <span class="n">attribute</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="ow">in</span> <span class="n">Ry</span>
<span class="n">Lookup</span> <span class="n">this</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">Ht</span> <span class="ow">and</span> <span class="n">find</span> <span class="n">the</span> <span class="n">joining</span> <span class="n">bucket</span>
<span class="k">for</span> <span class="p">(</span><span class="n">record</span> <span class="n">Rx</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">selected</span> <span class="n">bucket</span><span class="p">)</span>
<span class="p">{</span>
<span class="c"># Depending on the values, actual implementations</span>
<span class="c"># might add a check here to prevent errors due to</span>
<span class="c"># collisions.</span>
<span class="k">return</span> <span class="p">(</span><span class="n">Rx</span><span class="p">,</span> <span class="n">Ry</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>This algorithm hence consists of two “phases”</p>
<ol>
<li><strong>Build phase</strong> - where we build the hash table from relation X</li>
<li><strong>Probe phase</strong> - where we scan (probe) the relation Y to match hashes</li>
</ol>
<p>The build phase runs in <script type="math/tex">\mathcal{O}(N_x)</script> and the probe phase runs in <script type="math/tex">\mathcal{O}(N_y)</script> because hash table lookup is <script type="math/tex">\mathcal{O}(1)</script>.<br />
<strong>Overall complexity: <script type="math/tex">\mathcal{O}(N_a + N_b)</script></strong> which is linear and much better than the quadratic nested loop.</p>
<p><strong>As you might have guessed, two limitations immediately pop up when considering the hash join</strong></p>
<ol>
<li>What if during the build phase, the relation (table) <strong>does not fit into available memory</strong>?</li>
<li>What about <strong>non-equality conditions</strong>? Comparing hashes would work only for equi-joins and not for any generic join conditions.</li>
</ol>
<h3 id="dealing-with-the-limitations">Dealing with the limitations</h3>
<ul>
<li>
<h4 id="memory-constraint">Memory constraint</h4>
<p>If the whole relation does not fit into memory, then one way is to <strong>partition the relation into blocks</strong> of size that fit in memory, <strong>hash each block</strong> and then <strong>probe</strong> the other relation for each block of the first relation.</p>
<p>For joining X and Y, if we partition X into <script type="math/tex">P_x</script> blocks then the time taken for each block <script type="math/tex">B_x</script> to be joined with relation Y is <script type="math/tex">\mathcal{O}(N_{B_x} + N_y)</script>, similar to the classical hash-join above. Overall for all blocks this will take <script type="math/tex">\mathcal{O}(N_x + P_xN_y)</script> which is still better than the nested loop.</p>
</li>
<li>
<h4 id="equi-join-constraint">Equi-join constraint</h4>
<p>We cannot use hash join with a non-equality condition (because hashing). This remains a limitation of the algorithm.</p>
</li>
</ul>
<hr />
<h2 id="sort-merge-algorithm">Sort-merge algorithm</h2>
<p>The hash-join does not work for conditions other than equality, that’s where sort-merge algorithm hops in. This is the most commonly used algorithm in most RDBMS implementations. The special idea here is to first sort both the relations (tables) by the join attribute so that a linear scan with two probes (one for each relation) will be able to deal with both relations at the same time. Therefore, practically the costliest part of this algorithm is sorting the inputs. Sorting can be done in 2 ways</p>
<ul>
<li>Explicit external sort.</li>
<li>Exploit a pre-existing ordering in the join relations. For instance if the join input is produced by an index scan* then we already have that relation ordered.</li>
</ul>
<p>Therefore, for two relations X and Y, if X fits in <script type="math/tex">P_x</script> memory pages and Y fits in <script type="math/tex">P_y</script> memory pages, then the worst case running time would be <script type="math/tex">\mathcal{O}(P_x + P_y + P_xlog(P_x) + P_ylog(P_y))</script>.</p>
<p>There are numerous other join algorithms that leverage the ideas in the above mentioned basic algorithms, for instance the hybrid hash-join partitions each relation using a hash function for saving probe time on the second relation when performing the actual join. <strong><em>Knowing how joins are implemented in the DBMS being used, one might therefore write more efficient queries</em></strong>.</p>
<p><em>* Recommended reading: B+ tree, Bx tree</em></p>
<hr />Working in the Azure Disaster Recovery (ASR) team for over 6 months now, I have been using Kusto (a log analytics platform developed at Microsoft) extensively for interactive analysis and monitoring of internal service components and flows.What is bitcoin and why do I care?2017-02-20T00:00:00+00:002017-02-20T00:00:00+00:00https://aseemrb.me/bitcoin<p>tl;dr: <a href="https://bitcoinbook.aseemraj.me">Understanding bitcoin</a></p>
<p>Although it has been a lot of time since bitcoin came around, I never got digging. I attended TEDx Hyderabad sometime last year where we had a talk on bitcoin and blockchains. Then recently <a href="https://bitcoinmagazine.com/articles/needham-winklevoss-bitcoin-etf-would-have-profound-impact-on-price-but-approval-unlikely-1484254628/">something</a> caught my attention and I decided to sit down and understand bitcoin. Hence I started reading from various sources and I am trying to compile all of it in <a href="https://www.gitbook.com/book/aseemraj/understanding-bitcoin/details">a set of articles</a> for my own understanding and for anyone who would like to understand bitcoin in detail.</p>
<p>Although all the content of this gitbook will mostly be inspired from the following sources:</p>
<ul>
<li><a href="https://bitcoin.org/bitcoin.pdf">Bitcoin White Paper</a></li>
<li><a href="http://bitcoinbook.cs.princeton.edu/">Princeton Bitcoin Book</a></li>
<li><a href="http://bitcoin.stackexchange.com/">Bitcoin Stackexchange</a></li>
<li><a href="https://en.bitcoin.it/wiki/Main_Page">Bitcoin Wiki</a></li>
</ul>
<p>But, as I am writing these while still going through the above sources, the interpretations are my own and written in a way that I found the easiest to understand. This is a work in progress, so please <a href="mailto:aseemraj@protonmail.com">help me</a> shape it well along the way if you are interested.</p>
<hr />tl;dr: Understanding bitcoinImplementing PEGASOS, an SVM solver2016-06-17T00:00:00+00:002016-06-17T00:00:00+00:00https://aseemrb.me/pegasos-stochastic-grad-solver<p>Here’s the <a href="http://ttic.uchicago.edu/~nati/Publications/PegasosMPB.pdf">original paper</a> that proposes the algorithm that we’re going to implement.</p>
<p>SVMs are a very popular classification learning tool, and in the original form, the task of learning an SVM is actually a loss minimization problem with a penalty term for the <a href="https://en.wikipedia.org/wiki/Norm_(mathematics)">norm</a> of the classifier being learned. So for a given training set of <script type="math/tex">m</script> training examples <script type="math/tex">S = \{(x_i, y_i)\}_{i=1}^m</script>, where <script type="math/tex">x_i \in \mathbb{R}^n</script> and <script type="math/tex">y_i \in \{-1, +1\}</script>, we want to build a minimizer for the following function (note that here <script type="math/tex">w</script> and <script type="math/tex">x</script> are vectors):</p>
<script type="math/tex; mode=display">F = \min_{w}\left(\frac{\lambda}{2}||w||^2\right) + \frac{1}{m}\sum_{(x, y) \in S}l(w; (x, y))</script>
<p>where <script type="math/tex">l</script> is the loss function, given by <script type="math/tex">l(w; (x, y)) = \max{\{0, 1 - y \langle w, x \rangle\}}</script> where <script type="math/tex">\langle w, x \rangle</script> denotes the inner product of the two vectors. To have an intuitive insight into why this loss function works, let’s consider two examples:</p>
<script type="math/tex; mode=display">\langle w, x \rangle =
\begin{cases}
+ve\text{, when } w \text{ is very similar to } x\\
-ve\text{, when } w \text{ is quite different than } x
\end{cases}</script>
<p>The variable <script type="math/tex">y</script> has the classification information, where <script type="math/tex">1</script> means belonging to the class, and <script type="math/tex">-1</script> means not belonging to the class for which the classifier is being trained. Therefore, a positive value of <script type="math/tex">\langle w, x \rangle</script> along with <script type="math/tex">y = 1</script>, or a negative value of <script type="math/tex">\langle w, x \rangle</script> along with <script type="math/tex">y = -1</script> is favorable because both these cases denote that the prediction works right. It also means that the loss function will be minimized in these two scenarios (work it out if needed), which is a part of our main function <script type="math/tex">F</script> that we need to build the minimizer for.</p>
<p><strong>PEGASOS</strong> stands for <em>Primal Estimated sub-GrAdient SOlver for SVM</em>, where <em>stochastic</em> means <em>having a random probability distribution or pattern that may be analysed statistically but may not be predicted precisely.</em> Let’s dive into the algorithm and see why it’s <em>stochastic</em>. The PEGASOS algorithm performs stochastic gradient descent on the primal objective function <script type="math/tex">F</script> with a <em>carefully chosen</em> step size.</p>
<h3 id="the-basic-procedure">The basic procedure</h3>
<ul>
<li>Initially, set <script type="math/tex">w_1</script> to be the <a href="http://mathworld.wolfram.com/ZeroVector.html">zero vector</a></li>
<li>Iterate <script type="math/tex">T</script> times while doing the following in each iteration <script type="math/tex">t</script>
<ul>
<li>Choose a random training example <script type="math/tex">(x_{i_t}, y_{i_t})</script> by picking <script type="math/tex">i_t</script> uniformly at random from <script type="math/tex">\{1, 2, ... m\}</script></li>
<li>Replace the objective funtion <script type="math/tex">F</script> with an approximation based on this training example, yielding</li>
</ul>
<script type="math/tex; mode=display">f(w, i_t) = \frac{\lambda}{2} ||w||^2 + l(w; (x_{i_t}, y_{i_t}))</script>
<ul>
<li>Compute the subgradient of <script type="math/tex">f(w, i_t)</script> as</li>
</ul>
<script type="math/tex; mode=display">% <![CDATA[
\nabla_t = \lambda w_t - \mathbb{I}[y_{i_t} \langle w_t, x_{i_t} \rangle < 1] y_{i_t} x_{i_t} %]]></script>
<p>Here <script type="math/tex">\mathbb{I}</script> is the indicator function, which takes the value <script type="math/tex">1</script> if its argument is true, and <script type="math/tex">0</script> otherwise, so we know that the value will be <script type="math/tex">1</script> only when <script type="math/tex">w</script> yields some non-zero loss in the example <script type="math/tex">(x, y)</script></p>
<ul>
<li>Update <script type="math/tex">w_{t+1} = w_t - \eta_t\nabla_t</script> with the step size <script type="math/tex">\eta_t = 1/(\lambda t)</script></li>
</ul>
</li>
<li>Output <script type="math/tex">w_{T+1}</script></li>
</ul>
<p>Let’s try to see why this approximation is okay for a suitable <script type="math/tex">T</script>. For this, we’ll look at the value of the complete gradient, and the subgradient computed with the approximated function above, and see a relation between them. The complete gradient of <script type="math/tex">F</script> will be <script type="math/tex">\dot{F}</script>,</p>
<script type="math/tex; mode=display">\dot{F} = \frac{\lambda}{2} \nabla ||w||^2 + \frac{1}{m} \sum_{(x, y) \in S} l(w; (x, y))</script>
<p>Now what will be the expected value of the subgradient <script type="math/tex">\nabla</script> that we computed above? To find the expected value, we observe that the example taken to approximate the objective is chosen <em>uniformly at random</em>, which means that the probability of any example being selected is <script type="math/tex">P(e) = 1/m</script>. Thus the expected value of the subgradient <script type="math/tex">\nabla</script> turns out to be equal to the complete gradient of <script type="math/tex">F</script>, our primal objective function. And that is why intuitively this approximation is expected to work well enough. Next section deals with <strong>mini-batch iterations</strong>, to approximate the objective with more determinism.</p>
<h3 id="mini-batch-iterations">Mini-batch iterations</h3>
<p>As an extension of the basic procedure, now we would select a subset of examples, rather than selecting a single example for approximating the objective. So for a given <script type="math/tex">k \in \{1, 2, ... m\}</script>, we choose a subset of size <script type="math/tex">k</script> and approximate the objective as before. Note that <script type="math/tex">k = 1</script> is the case we already saw above. So now the objective can be written as
<script type="math/tex">f(w, A_t) = \frac{\lambda}{2} ||w||^2 + \frac{1}{k}\sum_{i \in A_t}l(w; (x_i, y_i))</script>
where <script type="math/tex">A_t</script> is the subset chosen in <script type="math/tex">t^{th}</script> iteration.</p>
<h3 id="projection-step">Projection step</h3>
<p>A potential variation in the above algorithm is that we limit the set of admissible solutions to a ball of radius <script type="math/tex">1/\sqrt{\lambda}</script>. To enforce this, project <script type="math/tex">w_t</script> after each iteration onto a sphere as <script type="math/tex">w_{t+1} = \min\{1, \frac{1/\sqrt{\lambda}}{||w_{t+1}||}\}w_{t+1}</script>. The revised analysis as presented in the paper does not compulsorily require this projection step. It mentions this as an optional step because no major difference was found during the experiments between the projected and the unprojected variants.</p>
<p>It is proved in the paper that the number of iterations required to obtain a solution of accuracy <script type="math/tex">\epsilon</script> is <script type="math/tex">O(1/\epsilon)</script>, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs required <script type="math/tex">\Omega(1/\epsilon^2)</script> iterations because in previously devised SVM solvers, the number of iterations also scales linearly with <script type="math/tex">1/\lambda</script>, where <script type="math/tex">\lambda</script> is the regularization parameter of the SVM; while with PEGASOS, this is not the case. PEGASOS works on an approximation, so the runtime of the algorithm is not dependent on the number of training examples or with some function of <script type="math/tex">\lambda</script>. It just depends on <script type="math/tex">k</script>, the size of the subset we are taking, and <script type="math/tex">T</script>, the number of iterations that we are making. The implemented code (in C++) will be put up later when I’m not feeling lazy.</p>
<hr />Here’s the original paper that proposes the algorithm that we’re going to implement.Weird but awesome JavaScript2016-05-26T00:00:00+00:002016-05-26T00:00:00+00:00https://aseemrb.me/weird-awesome-javascript<p>I’ve been programming something or the other in JavaScript for more than a year now, but I still did not have a clear understanding of what it really is, and how it works. What makes it so different than other languages, why browsers use it, etc.</p>
<p>But now, after some digging into how stuff works, I think I understand what it is. I had heard of Chrome’s runtime V8, but I did not know what it really is. I had used callbacks, but did not know how they work. So this post tries to clarify the small misconceptions and incomplete information about these things.</p>
<p>As Wikipedia puts it, <a href="https://en.wikipedia.org/wiki/JavaScript">JavaScript</a> is a <strong><em>high-level, dynamic, untyped, and interpreted programming language</em></strong>. But this statement doesn’t do much. We need to explore more. JavaScript is popularly known to be a <strong><em>single-threaded non-blocking asynchronous concurrent language</em></strong> with a <strong><em>call stack</em></strong>, <strong><em>event loop</em></strong>, a <strong><em>callback queue</em></strong> and some <strong><em>APIs</em></strong>. But V8 only has a call stack and a <strong><em>heap</em></strong>. Weird! What about the other stuff? The event loop, the callback queue and the APIs? And how can it be single-threaded as well as concurrent simultaneously? There’s something we are missing here.</p>
<p>As it turns out, the JavaScript runtimes (like V8) only have a heap for memory allocation, and a stack for contextual execution. The other things are the Web APIs in the browser, for instance the <code class="highlighter-rouge">setTimeout</code>, <code class="highlighter-rouge">AJAX</code>, <code class="highlighter-rouge">DOM</code> etc. So in a browser, JavaScript has the following structure:</p>
<p><img src="/images/weird-awesome-javascript/chrome.png" alt="jsRunTime" /></p>
<ul>
<li>A runtime like Chrome’s V8, which has a heap and the call stack</li>
<li>Web APIs provided by the browser, like <code class="highlighter-rouge">AJAX</code>, <code class="highlighter-rouge">setTimeout</code>, <code class="highlighter-rouge">DOM</code></li>
<li>A callback queue for events with callbacks, like <code class="highlighter-rouge">onLoad</code>, <code class="highlighter-rouge">onClick</code>, etc.</li>
<li>and an <em>event loop</em> that does something we’ll look at later</li>
</ul>
<p>The example image above is the representation of Chrome’s JavaScript environment. Notice that <strong><em>V8 Runtime (the big rectangular box)</em></strong> only has a call stack and a heap for memory allocation. The <strong><em>Web APIs</em></strong>, <strong><em>event loop</em></strong> and the <strong><em>callback queue</em></strong> are provided as external tools by the browser, and are not inherent to the V8 runtime. We’ll try to look at each of the parts and understand how this works.</p>
<h3 id="what-is-the-call-stack">What is the Call Stack?</h3>
<p>So let’s start with the call stack. What is this? Well, as we already know by now, JavaScript is single-threaded, which means it has a <strong>single call stack</strong>, which in turn means that it can do <strong><em>one thing at a time</em></strong>. This is the same as in an operating system, each process has the call stack, and each time a function is called it gets a new stack frame. Why a stack is used you ask? Because the call stack is fundamentally a data structure which keeps a record of where in the program is the execution going on. When the execution steps into a function, it is pushed on to the stack, and when a function returns after completion, it is popped off the stack, so we have to get back to the place from where this function call was made. Thus naturally, a stack data structure makes complete sense. Anyway, getting back to JavaScript, it’s the same thing, and whenever the program throws some error, we can see the call stack in the browser console.</p>
<h3 id="so-what-is-blocking">So what is ‘blocking’?</h3>
<p>Ever heard statements like <strong><em>nodejs uses an event-driven I/O bound non-blocking</em></strong> model that makes it perfect for data-intensive, real-time applications? Those terms are not very helpful yet. Let’s try to understand each term there.</p>
<ul>
<li><strong>Event-driven:</strong> This is a programming paradigm in which the flow of the program is determined by events such as user actions (mouse clicks, key-press), or messages from other programs. For example: “<em>When the user makes a GET request, render the page <code class="highlighter-rouge">index.html</code></em>”. This is an event based trigger as we might say, where the event is the user sending a GET request and the trigger is the rendering of the page <code class="highlighter-rouge">index.html</code>.</li>
<li><strong>I/O bound:</strong> This refers to a condition where the time taken to complete a computation is determined primarily by the time period spent waiting for input/output operations to be completed. This is the opposite of a task being CPU bound, where the completion time is primarily determined by the time taken for the actual computation. So the rate at which the process progresses is limited by the speed of the I/O subsystem and not the CPU, hence it is good for data-intensive, real-time applications.</li>
<li><strong>Blocking:</strong> It is the condition when the call stack is occupied for long and the event loop is stuck because some function does not return until it has completed what it was doing, and it is taking a long time doing it. Since JS is singe threaded, a time taking operation, like making a network request blocks the subsequent code. The execution has to wait until the request is complete. This problem is avoidable (let’s look at that later).</li>
</ul>
<p>So JS in the browser is a problem if it is blocking, isn’t it? Because say we make a network request, then we cannot click on things, submit forms, etc. because the browser is blocked now. <strong><em>But this does not happen! Why?</em></strong> Because we have <strong><em>asynchronous callbacks</em></strong> which solve this problem.</p>
<h4 id="is-concurrency-a-sham-then">Is Concurrency a sham then?</h4>
<p>So it’s false that JavaScript can only do one thing at a time. It’s true however that the JavaScript runtime can only do one thing at a time. But we can do things concurrently, because the browser is more than the runtime (refer to the image above).</p>
<p>Notice the arrows in the above image. The call stack can put things in the Web APIs, which push the callbacks into the callback queue once complete, and then comes the <strong><em>event loop magic</em></strong>. The event loop does the following:</p>
<figure class="highlight"><pre><code class="language-ruby" data-lang="ruby"><span class="k">if</span> <span class="n">the</span> <span class="no">Call</span> <span class="no">Stack</span> <span class="n">is</span> <span class="ss">empty:
</span><span class="n">take</span> <span class="n">the</span> <span class="n">first</span> <span class="n">thing</span> <span class="n">off</span> <span class="n">the</span> <span class="no">Callback</span> <span class="no">Queue</span> <span class="n">and</span>
<span class="n">push</span> <span class="n">it</span> <span class="n">onto</span> <span class="n">the</span> <span class="no">Call</span> <span class="no">Stack</span> <span class="n">of</span> <span class="n">the</span> <span class="n">runtime</span> <span class="p">(</span><span class="no">V8</span><span class="p">)</span></code></pre></figure>
<p>The event loop keeps looking at the call stack and the callback queue, and does this simple job when it meets the condition above. There exists a tool where we can visualize this clearly. <a href="http://latentflip.com/loupe">Loupe</a> helps visualize the whole process beautifully. Go put some code there and see what’s happening. For example, we take the code below (<a href="http://latentflip.com/loupe/?code=Y29uc29sZS5sb2coJ0hpJyk7CgpzZXRUaW1lb3V0KGZ1bmN0aW9uIGNiYWNrKCkgewogICAgY29uc29sZS5sb2coJ1RoaXMgd2lsbCBydW4gYWZ0ZXIgc29tZSB0aW1lJyk7Cn0sIDMwMDApOwogCmNvbnNvbGUubG9nKCdCdXQgdGhpcyBydW5zIGJlZm9yZSB0aGUgY2FsbGJhY2suJyk7!!!">on Loupe</a>).</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Hi'</span><span class="p">);</span>
<span class="nx">setTimeout</span><span class="p">(</span><span class="kd">function</span> <span class="nx">cback</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'This will run after some time'</span><span class="p">);</span>
<span class="p">},</span> <span class="mi">3000</span><span class="p">);</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'But this runs before the callback.'</span><span class="p">);</span></code></pre></figure>
<p>Though the visualization makes it clear, let’s go through what’s happening:</p>
<ol>
<li>Step into the <strong><code class="highlighter-rouge">console.log('Hi');</code></strong> function, so it’s pushed onto the call stack</li>
<li><strong><code class="highlighter-rouge">console.log('Hi');</code></strong> returns, so it is popped off the stack</li>
<li>Step into the <strong><code class="highlighter-rouge">setTimeout</code></strong> function, so it’s pushed onto the call stack</li>
<li><strong><code class="highlighter-rouge">setTimeout</code></strong> is a part of the Web API, so the Web API handles that and sets a timer for <strong><em>3 seconds</em></strong></li>
<li>The script continues, stepping into the <strong><code class="highlighter-rouge">console.log()</code> function in <code class="highlighter-rouge">line 7</code></strong>, pushing it onto the stack</li>
<li><code class="highlighter-rouge">console.log()</code> of <code class="highlighter-rouge">line 7</code> returns, so it’s popped off</li>
<li>The 3 second timer completes, so the callback <strong><code class="highlighter-rouge">cback()</code></strong> moves to the callback queue</li>
<li>The event loop checks if the call stack is empty. If it were not empty, it would wait. But because it is empty, the <strong><code class="highlighter-rouge">cback()</code></strong> is pushed from the callback queue onto the call stack.</li>
<li><strong><code class="highlighter-rouge">console.log()</code> of <code class="highlighter-rouge">line 4</code></strong> is defined in <strong><code class="highlighter-rouge">cback()</code></strong>, so it is pushed onto the stack and when it returns, it’s popped off the call stack.</li>
</ol>
<p>The interesting thing to observe here is that <strong><code class="highlighter-rouge">setTimeout</code></strong> with the second argument as <code class="highlighter-rouge">3000</code> doesn’t mean that the callback function will be called after 3 seconds. It means that it will be called <strong><em>whenever the call stack is empty after 3 seconds</em></strong>, which can also be never.</p>
<p>Try <a href="http://latentflip.com/loupe/?code=Y29uc29sZS5sb2coJ0hpJyk7CgpmdW5jdGlvbiBibG9ja2FnZSgpIHsKICAgIHdoaWxlKHRydWUpOwp9CgpzZXRUaW1lb3V0KGZ1bmN0aW9uIGNiYWNrKCkgewogICAgY29uc29sZS5sb2coJ1RoaXMgd2lsbCBydW4gYWZ0ZXIgc29tZSB0aW1lJyk7Cn0sIDMwMDApOwoKYmxvY2thZ2UoKTs%3D!!!">this</a> to understand the above statement:</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">"Hi"</span><span class="p">);</span>
<span class="kd">function</span> <span class="nx">blockage</span><span class="p">()</span> <span class="p">{</span>
<span class="k">while</span><span class="p">(</span><span class="kc">true</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">setTimeout</span><span class="p">(</span><span class="kd">function</span> <span class="nx">cback</span><span class="p">()</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">"This will run after some time"</span><span class="p">);</span>
<span class="p">},</span> <span class="mi">3000</span><span class="p">);</span>
<span class="nx">blockage</span><span class="p">();</span></code></pre></figure>
<p>As expected, <strong><code class="highlighter-rouge">cback()</code></strong> never makes it to the call stack because of <strong><code class="highlighter-rouge">blockage()</code></strong> and thus <strong><code class="highlighter-rouge">setTimeout</code></strong> fails to give us the desired thing after 3 seconds. The browser has to render the UI every <code class="highlighter-rouge">16.67 milliseconds (60 frames per second)</code>, and if there is blockage in the stack, it will not be able to render. So <strong><em>blocking the event loop</em></strong> actually means having some function on the stack that does not return well in time.</p>
<p><a href="http://latentflip.com/loupe/?code=JC5vbignYnV0dG9uJywgJ2NsaWNrJywgZnVuY3Rpb24gb25DbGljaygpIHsKICAgIHNldFRpbWVvdXQoZnVuY3Rpb24gdGltZXIoKSB7CiAgICAgICAgY29uc29sZS5sb2coJ1lvdSBjbGlja2VkIHRoZSBidXR0b24hJyk7ICAgIAogICAgfSwgMjAwMCk7Cn0pOwoKY29uc29sZS5sb2coIkhpISIpOwoKc2V0VGltZW91dChmdW5jdGlvbiB0aW1lb3V0KCkgewogICAgY29uc29sZS5sb2coIkNsaWNrIHRoZSBidXR0b24hIik7Cn0sIDUwMDApOwoKY29uc29sZS5sb2coIldlbGNvbWUgdG8gbG91cGUuIik7!!!PGJ1dHRvbj5DbGljayBtZSE8L2J1dHRvbj4%3D">This</a> code on loupe gives the example of a more complex program, with an event handler defined for a button. The <strong><code class="highlighter-rouge">$.on('button', 'click', ...)</code></strong> Web API keeps waiting for events (clicks on the button) (an example showing the event-driven nature), and pushes the said function in the callback queue when we click the button below. The event loop takes care of things thereafter.</p>
<p>This helped me clarify and satisfy some fundamental questions about JavaScript and know how it actually works. Do watch <strong>Philip Roberts</strong>’ talk on <em>event loops</em> <a href="https://www.youtube.com/watch?v=8aGhZQkoFbQ">here</a>.</p>
<hr />I’ve been programming something or the other in JavaScript for more than a year now, but I still did not have a clear understanding of what it really is, and how it works. What makes it so different than other languages, why browsers use it, etc.Some more #P complete problems2016-03-25T00:00:00+00:002016-03-25T00:00:00+00:00https://aseemrb.me/sharp-p-complete-problems<p>I will be discussing two #P-Complete problems in this post: #SAT (Finding the number of satisfying assignments for a boolean formula) and finding the number of 3-colorings in a graph.</p>
<p>Both SAT and 3-coloring are NP Complete problems, so understanding why their counting versions are #P-Complete is very easy compared to the counting version of a more easy problem like finding a perfect matching in a bipartite graph, which is synonymous to the 01-permanent problem as discussed in my previous post. It is strongly recommended that you come here via <a href="/sharp-p-problems">the previous post</a> for a smooth experience.</p>
<p>Proving any problem to be #P-Complete requires two things</p>
<ul>
<li>Proving that the problem is in #P</li>
<li>Proving that all #P problems are reducible to this problem
(that is, the problem is #P-Hard)</li>
</ul>
<h4 id="sat">#SAT</h4>
<p>#SAT is in #P by the definition of the problem. Since there exists a language in NP (the language being SAT) with a non-deterministic Turing Machine whose number of accepting paths is the number of satisfying assignments (this is one of the definitions of the class #P). To prove the completeness, we consider the Cook-Levin reduction from any language L in NP to SAT (the <a href="https://en.wikipedia.org/wiki/Cook%E2%80%93Levin_theorem">Cook-Levin theorem</a>). This reduction is a polynomial time computable function <script type="math/tex">f: \{0, 1\}^* \rightarrow \{0, 1\}^*</script> such that <script type="math/tex">\forall x \in \{0, 1\}^*</script>, <script type="math/tex">x \in L \leftrightarrow f(x) \in SAT</script>.</p>
<p>This proof has some more information that we can use. If the boolean formula in consideration is <script type="math/tex">\phi(x)</script>, each satisfying truth assignment for the <script type="math/tex">\phi</script> corresponds to an accepting computation path for the Turing Machine running on the input. So we have an efficient way to transform a certificate for input <script type="math/tex">x</script> to a satisfying assignment for <script type="math/tex">\phi(x)</script> and vice-versa.</p>
<p>What this means in simple terms is that the mapping from the certificates of <script type="math/tex">x</script> to the assignments of <script type="math/tex">\phi(x)</script> is invertible and hence one to one.
Conclusion: #satisfying assignments for <script type="math/tex">\phi(x)</script> = #certificates for <script type="math/tex">x</script>. Hence the Cook-Levin reduction preserves the number of solutions. Thus #SAT is a #P-Complete problem.</p>
<h4 id="number-of-3-colorings-in-a-graph">Number of 3-colorings in a graph</h4>
<p>The corresponding decision problem for this is:
<strong><em>Given a graph G, find whether there exists a 3-coloring for G. A 3-coloring is a mapping of each node of the graph to any of three available colors, such that no two nodes which are adjacent have the same color.</em></strong></p>
<p>The 3-coloring problem is NP-Complete, which we will prove shortly, and so the counting version is again very easy to be proved #P-Complete as in the case of #SAT. To prove 3-coloring to be NP-Complete, we first show that it <script type="math/tex">\in</script> NP, and then reduce 3-SAT (a well-known NP Complete problem) to 3-coloring, thus completing the proof.</p>
<p>Given a coloring scheme of 3 colors and the graph as an input instance to the problem, we can easily verify in polynomial time whether the coloring scheme is valid, by iterating over all the edges and checking that all the pairs of adjacent nodes have different colors. Thus the 3-coloring problem is in NP.</p>
<p>Now we reduce 3-SAT to 3-coloring and thus prove it to be NP-Hard. To do this, let’s consider a 3-SAT instance having the boolean formula <script type="math/tex">\phi(x)</script> with <strong><em>n variables</em></strong> and <strong><em>m clauses</em></strong>. The variables are <script type="math/tex">x_1, x_2, ... x_n</script></p>
<p>From <script type="math/tex">\phi</script>, we construct the graph G having:</p>
<ul>
<li>Vertex <script type="math/tex">v_i</script> for each <script type="math/tex">x_i</script></li>
<li>Vertex <script type="math/tex">\overline{v_i}</script> for each <script type="math/tex">\overline{x_i}</script></li>
<li>Vertices <script type="math/tex">u_{j1}, u_{j2}, ... u_{j5}</script> for each clause</li>
<li>3 special vertices T, F, B</li>
</ul>
<p>We will force T, F and B to be of different colors by forming a triangle among them. Actually in the correspondence, we take node T’s color to denote a truth assignment and node F’s color to denote a false assignment. Also note that <script type="math/tex">\forall i</script>, we want exactly one of <script type="math/tex">v_i</script> and <script type="math/tex">\overline{v_i}</script> to be colored as T’s color and the other one to be colored as F’s color. To do this, we form the edges as depicted in the image below.</p>
<p><img src="/images/sharp-p-complete-problems/assignments.png" alt="coloring" /></p>
<p>This ensures that every pair of <script type="math/tex">v_i</script> and <script type="math/tex">\overline{v_i}</script> is assigned the colors of T and F, both not being same. We have to take care of one more thing. For every clause, one of the literals must be true for the boolean formula to be true. So for each clause, we build a structure similar to the one given below. The example clause for the below image is <script type="math/tex">C_i = (a \vee b \vee c)</script>.</p>
<p><img src="/images/sharp-p-complete-problems/clause.png" alt="coloring" /></p>
<p>It is clear that all of a, b, and c cannot be false together, at least one of them has to be true for a possible 3-coloring to exist here.</p>
<p>This construction takes polynomial time. So 3-coloring is NP-Complete which completes our proof.</p>
<p>This problem can be reduced to k-coloring, by putting k = 3. So we also know now that k-coloring is NP-Complete, and finding the number of k-colorings in a graph is #P-Complete.</p>
<hr />I will be discussing two #P-Complete problems in this post: #SAT (Finding the number of satisfying assignments for a boolean formula) and finding the number of 3-colorings in a graph.The complexity class of #P problems2016-03-23T00:00:00+00:002016-03-23T00:00:00+00:00https://aseemrb.me/sharp-p-problems<p>Recently, I had an assignment in one of my courses - Beyond NP Completeness, to present the complexity class of #P (Sharp-P) problems based on <a href="https://en.wikipedia.org/wiki/Leslie_Valiant">L.G. Valiant</a>’s paper on complexity of computing the permanent of a matrix.</p>
<p>This post is an attempt to explain <a href="https://www.math.washington.edu/~billey/colombia/references/valiant.permanent.1979pdf.pdf">the paper</a> as I found very limited number of resources on the internet where this topic is discussed in detail, one of the sightings being a lecture by Professor <a href="http://erikdemaine.org/">Erik Demaine</a> under MIT open courseware which can be found on <a href="https://www.youtube.com/watch?v=XROTP1RiNaA">youtube</a>.</p>
<p>Before we begin, it is strongly recommended to familiarize yourself with the following:</p>
<ul>
<li>The classes <a href="https://en.wikipedia.org/wiki/P_(complexity)">P</a>, <a href="https://en.wikipedia.org/wiki/NP_(complexity)">NP</a>, <a href="https://en.wikipedia.org/wiki/NP-hardness">NP-Hard</a>, <a href="https://en.wikipedia.org/wiki/NP-completeness">NP-Complete</a></li>
<li><a href="https://en.wikipedia.org/wiki/Polynomial-time_reduction">Poly-time reductions</a></li>
<li>The class <a href="https://en.wikipedia.org/wiki/FP_(complexity)">FP</a></li>
<li>The notion of an <a href="https://en.wikipedia.org/wiki/Oracle_machine">Oracle</a></li>
</ul>
<p>So let’s begin with the permanent, which is defined for an <em>n x n matrix A</em> as:</p>
<script type="math/tex; mode=display">Perm\ A = \sum_{\sigma}{\prod_{i=1}^{n}{A_{i,\sigma(i)}}}</script>
<p>Before proceeding, let’s understand what this means, and what we want to compute.
<script type="math/tex">\sigma</script> is a permutation of the numbers {1, 2, 3, … n}.
For example, for n = 5, if <script type="math/tex">\sigma</script> = {2, 5, 3, 4, 1}, then <script type="math/tex">\sigma(1) = 2</script>, <script type="math/tex">\sigma(2) = 5</script>, <script type="math/tex">\sigma(3) = 3</script> and so on. The summation therefore is over all <script type="math/tex">\sigma</script>, which means we are doing the summation for all possible permutations, total of them being <script type="math/tex">n!</script> (factorial of n). So it is evident, that when taking the product, which is inside the summation term, we must take one element from each row and each column of the matrix, and a total of n elements have to be picked, therefore exactly one element from each row and each column has to be picked.</p>
<p>A closer look at the permanent tells us that if we change all the negative signs in the expression of the determinant of a matrix to positive signs, it will indeed become the permanent. To show this with an example:
Consider the following matrix M:
<script type="math/tex">% <![CDATA[
\begin{bmatrix}
1 & 2 & 4 \\
2 & 3 & 1 \\
1 & 3 & 1 \\
\end{bmatrix} %]]></script></p>
<p>The determinant of M = 1 x ((3x1) - (1x3)) + (-1) x 2 x ((2x1) - (1x1)) + 4 x ((2x3) - (3x1)), while the permanent is: 1x3x1 + 1x1x3 + 2x2x1 + 2x1x1 + 4x2x3 + 4x3x1, which is clearly obtainable by changing all negative signs in the determinant’s expression to positive. Surprisingly, though there exist <a href="https://en.wikipedia.org/wiki/Determinant#Calculation">efficient solutions</a> to compute the determinant of a matrix, yet to compute the permanent, no algorithm that takes better than exponential time is known.</p>
<p>Valiant comments on the complexity of the problem of finding the permanent of a <a href="https://en.wikipedia.org/wiki/Logical_matrix">(0-1) matrix</a>, for which he defines the class #P. To put it easily, #P problems are the counting problems associated with the decision problems in class NP. It is a class of function problems, and not decision problems (where the answer is a simple yes/no). An NP problem of the form “Does there exist a solution that satisfies X?” usually corresponds to the #P problem “How many solutions exist which satisfy X?”. Here goes the example of a #P problem:
#SAT - Given a boolean formula <script type="math/tex">\phi(x_1, x_2, ... x_n)</script>, find the number of assignments that satisfy <script type="math/tex">\phi</script>. This is the counting version of the famous <a href="https://en.wikipedia.org/wiki/Boolean_satisfiability_problem">SAT</a> problem which is known to be NP complete.</p>
<h4 id="defining-p-completeness">Defining #P completeness</h4>
<p>To define completeness in this class of problems, we need to bring in Oracle Turing Machines and the class FP. Oracle machines are those which have access to an <em>oracle</em> that can <em>magically</em> solve the decision problem for some language <script type="math/tex">L \subseteq \{0, 1\}^*</script>. These machines with oracle access can then make queries of the form “Is <script type="math/tex">q \in L</script>?” in one computational step. We can generalize this to non-boolean functions by saying that a T.M. M has oracle access to a function <script type="math/tex">f: \{0, 1\}^* \rightarrow \{0, 1\}^*</script> if it is given access to the language <script type="math/tex">L = \{(x, i) : f(x)_i = 1\}</script>.
For every <script type="math/tex">O \subseteq \{0, 1\}^*</script>, we denote <script type="math/tex">P^O</script> as the set of languages that can be decided by a polynomial time DTM (Deterministic Turing Machine) with oracle access to O. As an example, consider the <script type="math/tex">\overline{SAT}</script> problem, which denotes the language of unsatisfiable boolean formulae. <script type="math/tex">\overline{SAT}</script> <script type="math/tex">\in P^{SAT}</script> because if we are given an oracle access to the SAT language, then we can solve an instance <script type="math/tex">\phi</script> of the <script type="math/tex">\overline{SAT}</script> problem in polynomial time, by asking “Is <script type="math/tex">\phi \in SAT</script>?” and negating the answer to that.
Now we are ready to define #P completeness. A function <script type="math/tex">f</script> is said to be #P-complete if <script type="math/tex">f \in \#P</script> and <script type="math/tex">\forall g \in \#P, g</script> is in <script type="math/tex">FP^f</script> (Cook reduction).</p>
<p>Now let’s get back to the permanent. We take a special case of the permanent problem where we put a constraint that the input matrix is a (0, 1) matrix, that is, all the entries of the given matrix are either 0 or 1. Let us look at this problem of finding the permanent of a binary matrix with a different perspective. Imagine that the given matrix A is an <strong><em>adjacency matrix of a bipartite graph</em></strong>
<script type="math/tex">G = (X, Y, E)</script> where,
<script type="math/tex">X = \{x_1, x_2, ... x_n\}</script>
<script type="math/tex">Y = \{y_1, y_2, ... y_n\}</script>
<script type="math/tex">E = \{(x_i, y_j) : A_{ij} = 1\}</script></p>
<p>Now if we look at the term inside the summation, that is <script type="math/tex">\prod{A_{i, \sigma(i)}}</script> for <script type="math/tex">i = \{1, 2, ... n\}</script>, we can try to imagine the value of this term as synonymous to a possible perfect matching in the bipartite graph represented by A. Why? Because we know that this term takes one element from each row and each column, so all vertices are covered in the term, and this term = 0, if any of the elements picked are 0. But in our adjacency matrix, 0 for row i and column j means that there is no edge between <script type="math/tex">x_i</script> and <script type="math/tex">y_j</script>, so this term evaluates to 1 if and only if the selected elements form an edge cover (a perfect matching) for the given bipartite graph. So clearly, the whole term <script type="math/tex">Perm\ A = \sum\prod A_{i, \sigma(i)}</script> where the summation is over all <script type="math/tex">\sigma</script> represents the total number of perfect matchings in the bipartite graph represented by A. As finding the number of perfect matchings in a bipartite graph <script type="math/tex">\in</script> #P, therefore clearly, finding the permanent of a binary matrix <script type="math/tex">\in</script> #P.</p>
<p>Now we shall move on to prove the <strong><em>Valiant’s theorem</em></strong> which says that finding permanent for a binary matrix is #P-complete. We already proved it is in #P, so now all remains for us is to prove that it is in #P-Hard, that is all #P problems can be reduced to this problem in the way explained earlier in this post (where completeness is defined for #P problems). For this, we look at the problem of finding the permanent as a different graph problem.</p>
<p>Consider A = adjacency matrix of a weighted and directed graph with n nodes (We are talking about a general matrix now, not a binary matrix). We define two things now:</p>
<ul>
<li><strong>Cycle Cover of a graph G</strong>: A set of cycles (subgraphs of G in which all vertices have indegree = 1 and outdegree = 1) which contains all vertices of G</li>
<li><strong>Weight of a cycle cover</strong>: Product of weights of edges involved in the cover</li>
</ul>
<p>Now, with a good observation, we can conclude that <script type="math/tex">Perm(A) = \sum W_{i}</script> where <script type="math/tex">W_i</script> is the weight of the <script type="math/tex">i^{th}</script> cycle cover. We sum the weights of all possible cycle covers of the graph and it turns out to be equal to the permanent of the adjacency matrix. If this is hard to visualize, feel free to work out on an example having 3 or 4 nodes to let that sink in.</p>
<p>Proceeding with the proof, we will attempt to reduce an instance of the 3-SAT problem to an instance of the cycle cover problem. <strong><em>The methodology and examples are directly taken from the original paper</em></strong>. We begin with a boolean formula given to us in 3-<a href="https://en.wikipedia.org/wiki/Conjunctive_normal_form">CNF</a> form.
<script type="math/tex">F = C_1 \wedge C_2 \wedge C_3 \wedge ... C_m</script>, a conjunction of m clauses where
<script type="math/tex">C_i = (y_{i1} \vee y_{i2} \vee y_{i3})</script>, a disjunction of 3 literals, where
<script type="math/tex">y_{ij} \in \{x_1, \overline{x_1}, x_2, \overline{x_2}, ... x_n, \overline{x_n}\}</script>, the set of variable and their negations.</p>
<p>We construct graph G by superposing the following the structures:</p>
<ol>
<li>A Track <script type="math/tex">T_k</script> for each variable <script type="math/tex">x_k</script></li>
<li>An Interchange <script type="math/tex">R_i</script> for each clause <script type="math/tex">C_i</script></li>
<li>For each literal <script type="math/tex">y_{ij}</script> such that <script type="math/tex">y_{ij} = x_k</script> or <script type="math/tex">\overline{x_k}</script>, a Junction <script type="math/tex">J_{ik}</script> at which interchange <script type="math/tex">R_i</script> and track <script type="math/tex">T_k</script> meet.</li>
<li>The interchanges also have internal junctions, which are exactly same as above</li>
</ol>
<p>Example: Let’s take some formula F where:
<script type="math/tex">C_3 = (x_2 \vee \overline{x_5} \vee x_7)</script>
<script type="math/tex">x_5</script> occurs in <script type="math/tex">C_2</script> and <script type="math/tex">C_5</script>
<script type="math/tex">\overline{x_5}</script> occurs in <script type="math/tex">C_3</script></p>
<p>For this example, the following are the structures:</p>
<table>
<thead>
<tr>
<th style="text-align: center"><strong>Track <script type="math/tex">T_5</script></strong></th>
<th style="text-align: center"><strong>Interchange <script type="math/tex">R_3</script></strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><img src="/images/sharp-p-problems/t5.png" alt="Track T5" /></td>
<td style="text-align: center"><img src="/images/sharp-p-problems/r3.png" alt="Interchange R3" /></td>
</tr>
</tbody>
</table>
<p>The small shaded regions are the junctions, which are themselves a network of nodes and edges represented by the adjacency matrix <script type="math/tex">% <![CDATA[
X =
\begin{bmatrix}
0 & 1 & -1 & -1 \\
1 & -1 & 1 & 1 \\
0 & 1 & 1 & 2 \\
0 & 1 & 3 & 0 \\
\end{bmatrix} %]]></script></p>
<p>Why this matrix is valued exactly with these numbers will be clear as we proceed. This is a very crucial part. Let’s note some important things:</p>
<ul>
<li>Each junction (represented by <script type="math/tex">X</script>) has external connections only via nodes 1 and 4</li>
<li>Taking <script type="math/tex">X(a; b)</script> as the matrix leftover after deleting rows a and columns b, we note the following properties of the matrix <script type="math/tex">X</script>:
<ul>
<li>Perm(<script type="math/tex">X</script>) = 0</li>
<li>Perm(<script type="math/tex">X(1; 1)</script>) = 0</li>
<li>Perm(<script type="math/tex">X(4; 4)</script>) = 0</li>
<li>Perm(<script type="math/tex">X(1,4; 1,4)</script>) = 0</li>
<li>Perm(<script type="math/tex">X(1; 4)</script>) = Perm(<script type="math/tex">X(4; 1)</script>) = 4</li>
</ul>
</li>
</ul>
<p>Using these properties, we can draw very good insights. Let’s look at routes in the graph. A route is a cycle cover. If we consider all the routes which have the same set of edges outside of the junctions, then we can call a route bad if:</p>
<ul>
<li>
<h4 id="it-ignores-a-junction">It ignores a junction</h4>
<p>In this case, the cycle cover will have the ignored junction to be covered, so it will come separately as a product in the term. But Perm(X) = 0, so it will make the whole term 0 and thus will not contribute to the cycle cover.</p>
</li>
<li>
<h4 id="it-enters-and-leaves-a-junction-at-the-same-end">It enters and leaves a junction at the same end</h4>
<p>This case is bad because <script type="math/tex">Perm(X(1; 1)) = Perm(X(4; 4)) = 0</script>, so if nodes 1, 2, 3 or 2, 3, 4 remain (only one of the ends is covered) then again these nodes will separately come as a cycle and make that whole term 0. Thus no contribution to total number of cycle covers</p>
</li>
<li>
<h4 id="it-enters-at-node-1-of-a-junction-jumps-to-node-4-and-then-leaves-out">It enters at node 1 of a junction, jumps to node 4 and then leaves out</h4>
<p>This case leaves out nodes 2 and 3 of a junction, so they have to be covered in a separate cycle, but <script type="math/tex">Perm(X(1,4; 1,4)) = 0</script>, so this will again make the term 0 and contribute nothing in the total number of cycle covers</p>
</li>
</ul>
<p>So the only choice we have is to enter at either node 1 or node 4, and leave at the opposite end after covering nodes 2 and 3, if we want to make that route count towards the total number of cycle covers (the value of the permanent). Now if we go by this only choice, the contribution to the cycle will be 4, as Perm(<script type="math/tex">X(1; 4)</script>) = Perm(<script type="math/tex">X(4; 1)</script>) = 4.</p>
<p>In any track <script type="math/tex">T_k</script> of any good route, there are two cases as seen in the structure of the track:</p>
<ul>
<li>All junctions on the left side are picked by the track and those on the right side are picked by interchanges</li>
<li>The vice-versa of the above</li>
</ul>
<p>These two cases correspond to whether <script type="math/tex">x_k = 1</script> or <script type="math/tex">\overline{x_k} = 1</script></p>
<p>Now observe the interchanges. Each interchange has 5 junctions, 3 of which are connected to a corresponding track, which consists of the variable that is present in that particular clause, and 2 are internal junctions, which are of the same structure as a normal junction. A careful observation tells us that the whole of an interchange (all the five junctions) cannot be picked up by a route, in fact, all 3 of the external junctions can never be picked up in a route by the interchange, so at least one of the 3 junctions connected to tracks must be picked up by a track, this constraint being synonymous to the fact that we need at least one literal in the clause to be true, to make the whole clause true.</p>
<p>Now the total number of good routes (cycle covers) exactly corresponds to the total number of satisfying variable assignments for the boolean formula <script type="math/tex">F</script> of the SAT instance we had taken. Since #SAT is known to be #P-complete, the Permanent problem is now proved to be #P-Hard.</p>
<p>Let’s get on to finding the permanent of a 0-1 matrix now. It is a great thing to note that though the problem of finding a perfect matching in a bipartite graph <script type="math/tex">\in</script> P, yet counting the total number of perfect matchings <script type="math/tex">\in</script> #P-complete. This is one of those examples where it becomes clear that easy decision problems can have very hard counting versions of themselves. Rest of the post is about proving that 0-1 permanent is #P-complete.</p>
<p>To prove this, we will reduce the permanent problem to the 0-1 permanent problem. First we need to make the weights of all the edges non-negative. Doing this is easy with modular arithmetic. We can compute <script type="math/tex">permanent\ mod\ r</script> for all <script type="math/tex">r = \{2, 3, 5, 7, ... p\}</script> where <script type="math/tex">p \leq M^n n!</script>, as we do not need to consider primes greater than the largest value the expression of the permanent can take. Here <script type="math/tex">M</script> is the maximum value in the matrix, so a total of <script type="math/tex">n!</script> permutations of the entries with each of them being equal to the maximum value. That’s the maximum we will ever have.</p>
<p>Since the number of primes <script type="math/tex">\leq M^n n!</script> is <script type="math/tex">\leq log_2(M^n n!) \approx nlog_2M + nlog_2n</script>, the complexity is polynomial in terms of input size for this reduction. We can reconstruct the permanent using <a href="https://en.wikipedia.org/wiki/Chinese_remainder_theorem">CRT</a>.</p>
<p>So now we have a matrix where all the entries are non-negative. We have to reduce this to a form where all entries are either 0 or 1, to prove that the permanent problem is reducible to the 0-1 permanent problem. We can do so in two ways, by transforming the original graph to an equivalent graph.</p>
<ul>
<li>As mentioned in the original paper, Fig. 2 by forming self loops proportional to the weight of the edge.</li>
<li>First convert all the edges to widgets such that all the edges in the resultant graph have weights which are powers of 2. Then all edges that are powers of 2 can be easily transformed to widgets where we only have 0-1 edges. All the while proving that the initial and final graphs are equivalent.</li>
</ul>
<p>The first method is mentioned in the paper, so I will be explaining the second perspective here. Images are taken from Wikipedia. First we transform the graph <script type="math/tex">G</script> into one where all edge weights are powers of 2, the graph <script type="math/tex">G^{'}</script>. To do this, we take an edge with weight <script type="math/tex">w</script> and split it into a widget as shown in the image below. We use the fact that any number can be expressed as a sum of powers of 2.</p>
<p><img src="/images/sharp-p-problems/p2.png" alt="powers of 2" /></p>
<p>To prove the correspondence, take two cases for a cycle cover C of graph <script type="math/tex">G</script>:</p>
<ul>
<li>If edge u-v was not in C, then to cover all the edges in the transformed graph, we must use all the self-loops, so the total contribution = 1 in product, hence no change in the output. This is as good as not considering the edge u-v in the original graph</li>
<li>If edge u-v was present in C, then in all the corresponding cycle covers in <script type="math/tex">G^{'}</script>, there must be a path from u to v. We can see that there are total <script type="math/tex">r</script> such paths and they sum up to <script type="math/tex">w</script></li>
</ul>
<p>Hence graphs <script type="math/tex">G</script> and <script type="math/tex">G^{'}</script> are equivalent in terms of the permanent problem. Now we are left with a transformation such that all edges become binary, with weights 0 or 1. This is easy in the following way (Refer to the image below).</p>
<p><img src="/images/sharp-p-problems/2p.png" alt="to binary" /></p>
<p>This is the transformation from <script type="math/tex">G^{'}</script> to <script type="math/tex">G^{''}</script>. Again we have two cases to show the similarity. Let C be a cycle cover in <script type="math/tex">G^{'}</script>, then:</p>
<ul>
<li>If edge u-v was not a part of C, the only way to form a cycle cover (taking all vertices) is to take all the self-loops</li>
<li>If edge u-v was present in C, then in any cycle cover of <script type="math/tex">G^{''}</script> there must be a path from u to v. At each step from u to v, we have 2 choices, and such a choice has to be taken <script type="math/tex">r</script> times, so we have a total of <script type="math/tex">2^r</script> different possible paths from u to v, so it will contribute <script type="math/tex">2^r</script> overall, same as the weight of the u-v edge in <script type="math/tex">G^{'}</script></li>
</ul>
<p>Thus the problem of finding the permanent of a 0-1 matrix, and equivalently the problem of finding the number of perfect matchings in a bipartite graph <script type="math/tex">\in</script> #P-complete.</p>
<p>I have talked about more #P-complete problems and their proofs in <a href="/sharp-p-complete-problems">this post</a>.</p>
<hr />Recently, I had an assignment in one of my courses - Beyond NP Completeness, to present the complexity class of #P (Sharp-P) problems based on L.G. Valiant’s paper on complexity of computing the permanent of a matrix.Machine level obfuscation2015-01-25T00:00:00+00:002015-01-25T00:00:00+00:00https://aseemrb.me/machine-level-obfuscation<p>Let’s start with this beautiful piece of code in C. What do you think it does?</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include <stdio.h>
</span><span class="kt">double</span> <span class="n">d</span><span class="p">[]</span><span class="o">=</span> <span class="p">{</span><span class="mi">1156563417652693958656</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">272</span><span class="p">};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">d</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">--?</span><span class="n">d</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*=</span><span class="mi">2</span><span class="p">,</span> <span class="n">main</span><span class="p">()</span> <span class="o">:</span> <span class="n">printf</span><span class="p">(</span><span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">d</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Go ahead and run it on your machine! For lazy bums, here is the place you can see the output: <a href="http://ideone.com/UaGZDp">http://ideone.com/UaGZDp</a></p>
<p>Well, actually the output of this code depends on the machine, more specifically the <a href="http://en.wikipedia.org/wiki/Endianness">endianness</a> of the machine. Let us walk through the code step by step to understand all this.</p>
<h4 id="line-2">Line 2</h4>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">double</span> <span class="n">d</span><span class="p">[]</span><span class="o">=</span> <span class="p">{</span><span class="mi">1156563417652693958656</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">272</span><span class="p">};</span></code></pre></figure>
<p>Here we have simply declared a single dimensional double array and initialized it with two elements with some values. The numbers are specific, which we shall see later in this post.</p>
<h4 id="line-5">Line 5:</h4>
<p>This line is a fancy way of saying</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">if</span><span class="p">(</span><span class="n">d</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">d</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">d</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">d</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">d</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">main</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">d</span><span class="p">);</span></code></pre></figure>
<p>Ternary operators are used here instead of an if else block to condense the code. We can clearly see that the main function is called repeatedly until d[1] becomes 0. Then we typecast the double array to a char pointer and print its value as a string using the <code class="highlighter-rouge">"%s"</code> placeholder in the <code class="highlighter-rouge">printf</code> function.</p>
<p>So when does <code class="highlighter-rouge">d[1]</code> become 0? You got it right, it’s after doubling <code class="highlighter-rouge">d[0]</code> 272 times. Now when <code class="highlighter-rouge">d[1]</code> becomes 0, <strong><code class="highlighter-rouge">d[0] = 8.77663973968813359063877158122E102</code></strong> in the <em><strong>mantissa exponent</strong></em> notation. This 64 bit double number in binary is represented as:<br />
<strong>01010101 01001111 01011001 01000101 01010110 01001111 01001100 01001001</strong></p>
<p>These are the 8 bytes (64 bits) representing the number. Here the first bit 0 denotes the sign (+ve) of the number. The next 11 bits <strong>10101010100</strong> represent the <strong>exponent</strong>, and the last 52 bits <strong>1111010110010100010101010110010011110100110001001001</strong> represent the <strong>mantissa</strong>.</p>
<p>This is an 8-byte representation (64 bits) in the IEEE754 standard and so a char pointer will read the whole thing one byte at a time, as char is 1-byte in size in the C language implementation. Now comes the role of endianness. On little endian machines, the string will be read backwards, i.e. the last byte is read first, so the bytes read in order are (binary, hex, decimal, char ascii):</p>
<ul>
<li><code class="highlighter-rouge">01001001 = 0x49 = 73 = I</code></li>
<li><code class="highlighter-rouge">01001100 = 0x4C = 76 = L</code></li>
<li><code class="highlighter-rouge">01001111 = 0x4F = 79 = O</code></li>
<li><code class="highlighter-rouge">01010110 = 0x56 = 86 = V</code></li>
<li><code class="highlighter-rouge">01000101 = 0x45 = 69 = E</code></li>
<li><code class="highlighter-rouge">01011001 = 0x59 = 89 = Y</code></li>
<li><code class="highlighter-rouge">01001111 = 0x4F = 79 = O</code></li>
<li><code class="highlighter-rouge">01010101 = 0x55 = 85 = U</code></li>
</ul>
<p>As <code class="highlighter-rouge">d[]</code> is an array, the next byte after <code class="highlighter-rouge">d[0]</code> in memory is having <code class="highlighter-rouge">d[1]</code> which is now <code class="highlighter-rouge">= 0</code>, so this acts as a <code class="highlighter-rouge">NULL</code> character, a string terminator for the <code class="highlighter-rouge">%s</code> placeholder in <code class="highlighter-rouge">printf</code>. Hence, the output of the code above is <code class="highlighter-rouge">ILOVEYOU</code>.</p>
<p>Interesting, isn’t it? You just found a geeky way to say this to the love of your life! Anyway, this interesting aspect can be used to obfuscate any string into numbers, like I did with my name. <code class="highlighter-rouge">ASEEMRAJ</code> can be obfuscated by using <code class="highlighter-rouge">d[0] = 4875566432211777.0</code> and <code class="highlighter-rouge">d[1] = 113</code>. Now go and find the magical numbers for your own strings.</p>
<p><em>Note: If float is used instead of double we have 4 bytes, hence 4 characters instead of 8.</em></p>
<hr />Let’s start with this beautiful piece of code in C. What do you think it does?