Sayantan Khan's website/2022-08-16T00:00:00-04:00Writing a Pandoc filter to convert Org to Things2022-08-16T00:00:00-04:002022-08-16T00:00:00-04:00Sayantan Khantag:None,2022-08-16:/org-to-things.html<p>I have of late begun using <a href="https://culturedcode.com/things/">Things</a> as a planner app on my mobile devices, to supplement the usage of <a href="https://orgmode.org">Orgmode</a> on my computers.
Ideally I would have liked to use an Orgmode based app on my mobile devices, and <a href="https://beorgapp.com">beorg</a> seemed like a fairly good choice.
However, I wanted …</p><p>I have of late begun using <a href="https://culturedcode.com/things/">Things</a> as a planner app on my mobile devices, to supplement the usage of <a href="https://orgmode.org">Orgmode</a> on my computers.
Ideally I would have liked to use an Orgmode based app on my mobile devices, and <a href="https://beorgapp.com">beorg</a> seemed like a fairly good choice.
However, I wanted the iOS/iPadOS app to have a few extra qualities.</p>
<ol>
<li>The organizational hierarchy should be a <em>strict</em> subset of Orgmode's hierarchy. Orgmode is almost too flexible, and the excessive flexibility means that my <code>main.org</code> file is very disorganized. The constraints in the app should hopefully lead to a more maintainable system.</li>
<li>The app should be well maintained (i.e. not abandonware), with an iPad version that supports keyboard shortcuts.</li>
<li>The app also should have an API for importing data, as well as exporting it.</li>
</ol>
<p>Beorg does fairly well on the third front, since it can work directly with <code>.org</code> files, and supports various cloud backends, but the fact that it supports most of Orgmode's hierarchy meant that it wouldn't really help with my goal of making my Orgmode agenda less chaotic.
And the iPad version of beorg is also not too great, which pretty much ruled it out.</p>
<p>The first requirement really narrowed down the options, and I decided to go with <a href="https://culturedcode.com/things/">Things</a>.
It fares quite well on the first and second requirements: it only supports hierarchical trees of depth at most 3, and it groups Todos in Projects, which are grouped by Area, which is simple enough to maintain, but flexible enough to work for most situtations.
The actual apps are also very well made, with a clean and clutter free interface, and no in-app purchases or subscriptions.</p>
<!-- The apps are quite expensive, coming in at 10 and 20 USD for the iPhone and iPad, but that's a one-time purchase, which makes it easier to stomach. -->
<p>The only requirement that Things lacks in is the third one.
One way to import Projects and Todos into Things are <a href="https://culturedcode.com/things/support/articles/2803573/">specially crafted URLs</a>, but those need to be generated, and then manually clicked, and exporting data involves exporting and SQLite database file, and parsing it, which is not very ergonomic.
Since I mostly plan to import Org agenda items to Things, and not the other way around, I decided just implement an Org-to-Things pipeline, and worry about a Things-to-Org pipeline later if it's necessary.</p>
<h2>My Org mode export setup</h2>
<p>Prior to when I started using Things, I would visit a webpage I autogenerated from my Org file using <a href="https://pandoc.org/">Pandoc</a>. Pandoc does an excellent job parsing Org, and turning it HTML that retains all the information: tags look visually different, TODOs are prominent, the hierarchical structure is preserved, and a table of contents is generated as well.
The plan was to write some code that hooked into the Org-to-HTML pipeline, and insert the custom Things links at the end of each Todo and Project.
Pandoc makes doing that very easy through <a href="https://pandoc.org/filters.html">Pandoc filters</a>.
Pandoc filters are programs that transform the abstract syntax tree (AST) that Pandoc internally uses to represent documents, and these filters are easily applied to any Pandoc conversion.</p>
<p>However, Pandoc's native AST does not really encode the hierarchy of an Org file in a tree-like form, but rather as <code>[Block]</code>, i.e. a list of blocks, which coarsely correspond to a line of the Org file, but with punctuations and tags parsed.
It does make sense why the AST does not really capture the Org syntax tree, since the Pandoc AST is quite general, and needs to be able to read and write to several different formats.
But that did mean that I would have to write a parser to parse <code>[Block]</code>, turn it into a "real" AST, then insert the Things links, and then output the new <code>[Block]</code>.</p>
<h2>Writing the parser</h2>
<p>Rather than using an existing parser combinator library like <a href="https://hackage.haskell.org/package/megaparsec">Megaparsec</a>, I decided it would be more fun to implement monadic parsing from scratch, following <em><a href="https://doi.org/10.1017/S0956796898003050">Monadic parsing in Haskell</a></em>.
The parser itself turned out to be fairly simple to write: it parsed the <code>Block</code>s, kept track of whether it descended into an 'Area' (which is a collection of 'Project's) or a 'Project' (which is a collection of 'Todo's).
As soon as the parser parses a 'Todo' (which is the leaf node), it outputs the <code>Block</code>s it read, with an additional <code>Block</code> appended, which contains the specially crafted link.</p>
<p>The parsing is really a single pass compilation process: it's a wrapper around a <code>StateT</code> where the state is <code>[Block]</code>.</p>
<div class="highlight"><pre><span></span><code><span class="n">type</span><span class="w"> </span><span class="n">BlockParser</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">StateT</span><span class="w"> </span><span class="o">[</span><span class="n">Block</span><span class="o">]</span><span class="w"> </span><span class="p">(</span><span class="n">Either</span><span class="w"> </span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="o">[</span><span class="n">Block</span><span class="o">]</span><span class="w"></span>
</code></pre></div>
<p>One way to possibly improve the single-pass compiler/parser would have been to incorporate the second <code>[Block]</code> argument in the state, like the following definition.</p>
<div class="highlight"><pre><span></span><code><span class="n">type</span><span class="w"> </span><span class="n">BlockParser</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">StateT</span><span class="w"> </span><span class="p">(</span><span class="o">[</span><span class="n">Block</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="o">[</span><span class="n">Block</span><span class="o">]</span><span class="p">)</span><span class="w"> </span><span class="n">Either</span><span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="n">a</span><span class="w"></span>
</code></pre></div>
<p>Here, the first element of the state tuple represents the <code>Block</code>s left to read, and the second argument represents the parsed and modified <code>Block</code>s.
While this is conceptually a cleaner way of writing the parser, in practice it led to slightly longer code, but if I do something like this in the future, I will probably go with this version.</p>
<p>Once I had the single-pass compiler/parser, turning it into a Pandoc filter was fairly straightforward, following <code>pandoc-types</code> documentation.
The real upshot of this effort is that I now have a parser for the subset of Org I use, that I can reuse for other projects, like a web-frontend for agenda system.</p>
<p>The final parser and filter are now hosted <a href="https://github.com/sayantangkhan/org-to-things">on Github</a>.</p>On why I use Emacs to write TeX2020-10-17T00:00:00-04:002020-10-17T00:00:00-04:00Sayantan Khantag:None,2020-10-17:/emacs-and-latex.html<p>An observation I made a while ago while peeking over people's shoulders (with their
consent, of course) is that even when they're writing TeX in a powerful text
editor, like Emacs or Vim, most people don't really harness the extensive
programmability of their text editor (especially MM<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>). For instance …</p><p>An observation I made a while ago while peeking over people's shoulders (with their
consent, of course) is that even when they're writing TeX in a powerful text
editor, like Emacs or Vim, most people don't really harness the extensive
programmability of their text editor (especially MM<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>). For instance, here's a
non-exhaustive list of things you can program Emacs to do for you, rather than
doing it yourself.</p>
<ul>
<li>Begin an environment, and place your cursor in the environment, rather than
painfully typing out <code>\begin{<environment>}</code> and <code>\end{<environment>}</code>. For
more complex environments like <code>figure</code>, Emacs can also prompt you
for the required parameters, and suggest sensible defaults.</li>
<li>Automatically insert appropriate labels for sections, equations, and
the like.</li>
<li>Pop up an easily navigable list of labels when you're to reference something. The
list of labels also features some of the text following it so that you can tell at
a glance what each label corresponds to.</li>
<li>When citing something from a BibTeX file, rather than typing out the label of the
entry, Emacs lets you fuzzy search by author, words in the abstract, by the
journal, etc.</li>
<li>Ensure that you never have unmatched parentheses, quotes, and the TeX variants
like <code>\langle</code> and <code>\rangle</code>, <code>\left(</code>, and <code>\right)</code>, etc.</li>
</ul>
<p>Some of these features, or approximations thereof, are baked into some of the more
TeX specific editors, like TeXStudio, or the Overleaf editor, but they aren't quite
as powerful as the Emacs version, neither are they customizable. The features
listed above are provided by the packages <code>AuCTeX</code>, <code>RefTeX</code>, and <code>smartparens</code>.
As with most Emacs packages, these are written in Emacs Lisp, and as a result,
their source code is quite easy to understand, and modify.</p>
<h2>Demonstrations</h2>
<table class="image">
<caption align="bottom">Entering a figure environment, and filling in the details. The keyboard shortcut for creating
a new environment is `\e`.</caption>
<tr><td>
<video width="600" height="600" controls>
<source src="../../images/emacs-and-latex/part1.webm" type="video/webm">
Your browser does not support the video tag.
</video>
</td></tr>
</table>
<table class="image">
<caption align="bottom">Creating a new equation: note that it automatically gets labelled as `eq:1`.</caption>
<tr><td>
<video width="600" height="600" controls>
<source src="/images/emacs-and-latex/part2.webm" type="video/webm">
Your browser does not support the video tag.
</video>
</td></tr>
</table>
<table class="image">
<caption align="bottom">Creating a reference to an equation. The popup shows what equation each label corresponds to.</caption>
<tr><td>
<video width="600" height="600" controls>
<source src="/images/emacs-and-latex/part3.webm" type="video/webm">
Your browser does not support the video tag.
</video>
</td></tr>
</table>
<table class="image">
<caption align="bottom">Citing a book/paper. Note that all I needed to specify was that the author was Thurston.</caption>
<tr><td>
<video width="600" height="600" controls>
<source src="/images/emacs-and-latex/part4.webm" type="video/webm">
Your browser does not support the video tag.
</video>
</td></tr>
</table>
<table class="image">
<caption align="bottom">As soon as I write `\langle`, Emacs automatically creates the matching `\rangle`.</caption>
<tr><td>
<video width="600" height="600" controls>
<source src="/images/emacs-and-latex/part5.webm" type="video/webm">
Your browser does not support the video tag.
</video>
</td></tr>
</table>
<p>I hope that this video post encourages people to start using Emacs, and writing elisp to customize it. <a href="https://stallman.org/articles/happy-hacking.html">Happy hacking</a>!</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>If you're reading this, you know what I'm talking about. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Summer 2020 update2020-09-26T00:00:00-04:002020-09-26T00:00:00-04:00Sayantan Khantag:None,2020-09-26:/update-summer-2020.html<p>Another 9 months have gone by without a blog post, and this time I don't even have lack of time as an excuse
my lack of writing. This summer turned out to be quite different from what I had planned, which probably goes
to show one shouldn't plan too far …</p><p>Another 9 months have gone by without a blog post, and this time I don't even have lack of time as an excuse
my lack of writing. This summer turned out to be quite different from what I had planned, which probably goes
to show one shouldn't plan too far ahead, in this day and age. I figured it might be a good time make a list of
what I did this summer, and contrast that with what I was planning to do, before I forget the details.</p>
<h2>Math</h2>
<p>I was supposed to begin my summer with a conference at Haifa, but that of course didn't pan out, given the
pandemic. The expectation was to follow up the conference by learning more Lie group theory and associated
dynamics and representation theory, along the lines of what Hee Oh does, but I never got around to doing that.</p>
<p>However, I did manage to get some math done this summer.</p>
<ul>
<li>I worked with my undergraduate collaborator CP<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>, and my postdoc mentor BW, as a part of AW's REU, and
managed to prove some asymptotic bounds on the minimal stretch factors of certain pseudo-Anosov maps. BW is
still reviewing the paper right now, and fixing up some sections, but hopefully the preprint should be up on
ArXiV soon.</li>
<li>I also have enough theorems to write up a paper on the project I have been working on since February. Writing
it up might be a bit of a nightmare, given how long it's been since the earlier results. But hopefully I should
have something up on ArXiV in a month.</li>
<li>I also learnt enough Teichmüller theory and Teichmüller dynamics to pass my prelims, which is definitely
good. I am no longer as afraid of complex analytic Teichmüller theory as I used to be.</li>
</ul>
<h2>Programming</h2>
<p>This summer was fairly productive in terms of programming: I went through most of
<a href="https://www.nand2tetris.org/">nand2tetris</a>, i.e. designed a 16-bit computer from scratch, starting with the
NAND gates, going up to assembly, followed with an intermediate VM language. I skipped writing a compiler for
compiling Scheme to the intermediate VM language, because the Scheme specification is more complicated than I
thought. I followed that up with trying to write a x86-64 microkernel in Rust, but x86-64 has a lot of details
to debug, so I shelved that project as well. Right now, I am writing some userspace filesystem code that
should be easier to debug, given that it doesn't work at the kernel level.</p>
<h2>Climbing</h2>
<p>My climbing goals have suffered the most since the pandemic began. Originally, I was planning on getting my
lead climbing certification this summer, and progress to climbing 5.10+ or 5.11- outdoors, but none of that
happened after the gyms closed down. I've only climbed twice outside since then, and my footwork, among other
things, has obviously degraded. I'm just hoping that once the gyms open up again, I can recover to the point
where I was in a month or two.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>I figure it's a good idea to not name people I know personally in my blog. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Counting orbit points (part 3): Asymptotics for convex-cocompact groups2020-01-12T00:00:00-05:002020-01-12T00:00:00-05:00Sayantan Khantag:None,2020-01-12:/orbit-points-3.0.html<p>In the <a href="/orbit-points-2.0.html">previous post</a>, we proved Sullivan's shadow lemma, which gave
us concrete estimates for special subsets of the boundary, namely <em>shadows</em>. Recall that the shadow of a ball of
radius $r$ based at $y$, with the source at $x$, denoted by $\mathcal{O}_r(x, y)$, is the set …</p><p>In the <a href="/orbit-points-2.0.html">previous post</a>, we proved Sullivan's shadow lemma, which gave
us concrete estimates for special subsets of the boundary, namely <em>shadows</em>. Recall that the shadow of a ball of
radius $r$ based at $y$, with the source at $x$, denoted by $\mathcal{O}_r(x, y)$, is the set of all points $z$
in the boundary $\partial \mathbb{H}^2$ such that a geodesic from $x$ to $z$ gets within distance $r$ of $y$. In the
special case when $y$ and $x$ lie in the same orbit, we can estimate the Patterson-Sullivan measure of the shadow.</p>
<p><strong>Lemma</strong> (Sullivan's shadow lemma): Let $\Gamma$ be a non-elementary discrete group, and let ${\mu_x}<em x>{x \in \mathbb{H}^2}$ be a $\Gamma$-invariant conformal density of
dimension $\delta$. Then for any $x$ there exists a large enough $r_0$, such that for all
$r \geq r_0$, there exists a $C > 0$, depending on $r$ for which the following inequality holds
for all $\gamma \in \Gamma$.
\begin{equation}
\frac{1}{C} e^{-\delta \cdot d(x, \gamma x)} \leq \mu</em>\left( \mathcal{O}_r(x, \gamma x) \right) \leq C e^{-\delta \cdot d(x, \gamma x)}
\end{equation}</p>
<p>We went through all the trouble of developing the machinery of Patterson-Sullivan theory to answer the following
question.</p>
<p><strong>Question</strong>: Suppose $\Gamma$ is a <em>convex-cocompact group</em><sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. Let $x_0$ be a point in $\Omega$, and let
$C(r)$ be the number of points in the orbit $\Gamma x_0$ that lie within distance $r$ of $x_0$. What can we
say about the asymptotics of $C(r)$? Do we have asymptotic upper and lower bounds? Can we determine what
$C(r)$ is asymptotically equal to?</p>
<p>One can prove a somewhat weak asymptotic upper bound without using any Patterson-Sullivan theory.</p>
<p><strong>Proposition</strong>: $C(r)$ is less than $e^r$.</p>
<p><strong>Proof</strong>: Since the group $\Gamma$ is discrete, there exists a small enough constant $r_0 > 0$, such that for
any pair of points in the orbit $\Gamma x_0$ are distance at least $2r_0$ apart. Let $S_r$ be the set of
orbit points within distance $r$ of $x_0$, and let $T_r$ be the union of balls of radius $r_0$ around each
point of $S_r$. These balls don't intersect. Furthermore, all these balls lie in the ball of radius $r+
r_0$ around $x_0$. The volume of a ball of radius $r$ is $ke^{r}$ for a constant $k$. We also note
that the volume of $T_r$ must be less than the volume of the ball of radius $r+r_0$. This gives us that
following inequality.
\begin{align<em>}
k C(r) e^{r_0} \leq ke^{(r+r_0)}
\end{align</em>}
Rearranging the above inequality gives us the result. $\square$</p>
<p>Since we didn't have to work very hard to prove this inequality, it shouldn't come as a surprise
that this bound is far from sharp. We can prove a better inequality by using Sullivan's shadow lemma.
For simplicity, we'll first estimate the quantity $A(r)$ instead, which is
the number of orbit points contained in an annulus of inner radius $r-1$ and outer radius $r$ about
the point $x_0$. The value of $C(r)$ can be obtained by just adding up the values of $A(r)$,
i.e. $C(r) = \sum_{i=1}^{r} A(r)$.</p>
<p><strong>Proposition</strong>: Suppose $\delta$ is the critical exponent<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup> of $\Gamma$. Then there exists a positive constant
$k$ which makes the following inequalities involving $A(r)$ hold for all $r$.
\begin{align<em>}
\frac{1}{k} e^{\delta r} \leq A(r) \leq k e^{\delta r}
\end{align</em>}</p>
<p><strong>Proof</strong>: We begin by defining the set $\alpha_r$: this is the set of elements $\gamma$ in $\Gamma$ such
that $\gamma x_0$ lands in the annulus of inner radius $r-1$ and outer radius $r$ about $x_0$,
i.e. $\gamma x_0$ contributes to $A(r)$. Consider the following function defined on
$\partial \mathbb{H}^2$.
\begin{align<em>}
S(\xi) := \sum_{\gamma \in \alpha_r} \mathbb{1}<em r_0>{\mathcal{O}</em>(x, \gamma x)}(\xi)
\end{align</em>}
Here, $r_0$ is some big enough constant such that the Sullivan's shadow lemma applies. Note that
this function depends on $r$ as well, although the notation hides that dependence. That's because
this function can be uniformly bounded above and below, independent of what $r$ is. The lower
bound of course, is just $0$, which is trivial, but the upper bound requires proof. For any point
$\xi$ on the boundary, the value of $S(\xi)$ will depend on how many balls of radius $r_0$ about
points in $\alpha(r)$ intersect the geodesic going from $x$ to $\xi$. By an application of the
triangle inequality, it suffices to count how many of these balls intersect a segment of length
$2r_0 + 1$ depicted in Figure 1.
<br>
<table class="image">
<caption align="bottom">Figure 1: Line segment indicating where $S(\xi)$ may get some contribution from.</caption>
<tr><td><img src="../images/orbit-points-3/fig1.png" width="70%" height="auto" class="center"/></td></tr>
</table>
</p>
<p>This means that $S(\xi)$ is at most the number of orbit points contained in a neighbourhood of radius $r_0$
about the segment of length $2r_0 + 1$. Since $r_0$ is fixed, this quantity is bounded above no matter where
the segment is on $\mathbb{H}^2$, which proves the uniform upper bound on the function $S$. Let us denote this
uniform bound by $J$. This uniform upper bound will help us get an upper bound on $A(r)$. Observe that
showing the upper bound didn't require convex-cocompactness, and in fact works for any discrete group
$\Gamma$.</p>
<p>We can use convex-cocompactness of $\Gamma$ to get a positive lower bound on $S(\xi)$ when $\xi$ is in the limit
set $\Lambda(\Gamma)$. Since $\Gamma$ is convex cocompact, the restriction of a fundamental domain to the convex
hull of the limit set will be compact. Without loss of generality, we can assume the diameter of this compact
set is less than $1$ (otherwise we just make the annuli thicker). If $\xi$ is in the limit set, the geodesic
from $x_0$ to $\xi$ is always within distance $1$ of some orbit point, since the closure of the orbits
of the compact set contain the limit set. This means $S(\xi) \geq 1$, and proves the lower bound.</p>
<p>We have, with some effort, managed to prove our main inequality, which holds for all $\xi$ in the
radial limit set, which has full measure.
\begin{align}
\label{eq:4}
1 \leq S(\xi) \leq J
\end{align}
We now integrate this with respect to the Patterson-Sullivan measure $\mu_x$, and interchange the
sum and the integral to get the following inequality.
\begin{align}
\label{eq:8}
\mu_x(\partial \Omega) \leq \sum_{\gamma \in \alpha_r} \mu_x\left( \mathcal{O}<em _alpha_r="\alpha_r" _gamma="\gamma" _in="\in">r(x, \gamma x) \right) \leq J \mu_x(\partial \Omega)
\end{align}
Sullivan's shadow lemma also gives us another useful inequality.
\begin{align}
\label{eq:9}
\frac{1}{C} \sum</em> e^{-\delta r} \leq \sum_{\gamma \in \alpha_r} \mu_x\left( \mathcal{O}<em _alpha_r="\alpha_r" _gamma="\gamma" _in="\in">r(x, \gamma x) \right) \leq C \sum</em> e^{-\delta r}
\end{align}
Combining inequalities \eqref{eq:8} and \eqref{eq:9}, we get the inequalities we were after.
\begin{align}
\label{eq:10}
\frac{1}{k} e^{\delta r} \leq A(r) \leq k e^{\delta r}
\end{align}
Summing up the $A(r)$ gives us a similar inequality for $C(r)$, which is what we wanted. $\square$</p>
<p>We have improved upon our earlier naïve inequalities in two ways. First, we now have a lower bound
too. Secondly, if we know that $\delta < 1$, we will have a strictly better inequality. Of course,
this is only an improvement if we can actually find groups $\Gamma$ where the critical exponent
$\delta$ is less than $1$. Schottky groups provide a family of such examples.</p>
<p><strong>Example</strong>: (Schottky groups) Let $x_0 \in \mathbb{H}^2$ be a basepoint. Let $a_1$ be a hyperbolic isometry whose axis passes through
$x_0$, and translates $x_0$ by distance $d$, where $d$ is a constant we'll fix later, and $a_2$ be
another hyperbolic isometry with translation distance $d$ whose axis also passes through $x_0$,
and is perpendicular to the axis of $a_1$. For $d$ large enough, $a_1$ and $a_2$ generate a free
subgroup of $\mathrm{PSL}(2, \mathbb{R})$: this will be our group $\Gamma$. To get an upper bound on the
critical exponent of $\Gamma$, we need to relate the word length of elements of $\Gamma$ to the
distance they translate $x_0$ by. Suppose we had the following inequality for all elements
$\gamma$ in $\Gamma$.
\begin{align}
\label{eq:11}
|\gamma| d' - c \leq \mathrm{dist}(x_0, \gamma x_0)
\end{align}
We could use this to determine for what values of $s$ does the following infinite sum converge
(recall that this infinite sum showed up in the construction of the Patterson-Sullivan measure).
\begin{align<em>}
\sum_{\gamma \in \Gamma} \exp(-s \cdot d(x_0, \gamma x_0))
&= \sum_{n=0}^{\infty} \sum_{|\gamma| = n} \exp(-s \cdot d(x_0, \gamma x_0)) \
&\leq \sum_{n=0}^{\infty} \sum_{|\gamma| = n} \exp(-s \cdot (nd' - c)) \
&\leq \sum_{n=0} k 3^n \exp(-snd') \
&= \sum_{n=0} k \exp(\ln 3 - snd')
\end{align</em>}
Clearly, for large values of $s$, this converges, to no one's surprise. But since we want the
critical exponent to be less than $1$, would like this to converge for $s < 1$. We could make that
happen if we $d'$ could get arbitrarily large: we claim that as $d$ gets larger, so does $d'$.</p>
<p>To prove the above claim, we need to do some elementary hyperbolic geometry. Firstly, observe that
if the world length of $\gamma$ is $n$, then the geodesic joining $x_0$ to $\gamma x_0$ goes
through $(n-2)$ translates of the fundamental domain of $\Gamma$. The boundary of the fundamental
domain is composed of $4$ biinfinite geodesics, such that the distance between each pair is
bounded below, away from $0$. That lower bound on the distance is where $d'$ comes from. One can
explicitly relate $d$ and $d'$ in this manner by constructing a hyperbolic right pentagon. The details
of this construction are elementary, and left to the reader, and after that point, one uses the formula for
hyperbolic right angled pentagons (see Thurston's, <a href="http://library.msri.org/books/gt3m/PDF/Thurston-gt3m.pdf">The Geometry and Topology of Three-Manifolds</a>) to get the formula for $d'$. Roughly,
as $d$ gets larger, $d'$ is approximately $2d$, which certainly approaches infinity, proving our claim.</p>
<p>This example shows us that Patterson-Sullivan theory did give us a strong estimate of the count of the orbit
points. This concludes the three part series on counting orbit points. All of this material can be found in
Dennis Sullivan's excellent paper, titled <a href="http://www.numdam.org/article/PMIHES_1979__50__171_0.pdf">The density at infinity of a discrete group of hyperbolic motions</a>.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>A convex cocompact group is a discrete subgroup of $\mathrm{PSL}(2, \mathbb{R})$ that acts cocompacty on
the convex hull of its limit set. The surface one gets by quotienting the hyperbolic plane by a
convex-cocompact group is a hyperbolic surface with no cusps, and finitely many <em>flares</em>, which have infinite
area, and look like one of the ends of $\mathbb{H}^2$ modulo an hyperbolic element. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>See the <a href="/orbit-points-3.0.html">previous post</a> for the definition of critical exponent. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Counting orbit points (part 2): Patterson-Sullivan theory2020-01-05T00:00:00-05:002020-01-05T00:00:00-05:00Sayantan Khantag:None,2020-01-05:/orbit-points-2.0.html<p>In the <a href="/orbit-points-1.0.html">previous post</a>, we saw how to get an asymptotic
count of orbit points under a lattice action, i.e. a finite covolume Fuchsian group. To do so, we needed the
fact that the geodesic flow on the associated quotient was mixing with respect to the Liouville
measure. That …</p><p>In the <a href="/orbit-points-1.0.html">previous post</a>, we saw how to get an asymptotic
count of orbit points under a lattice action, i.e. a finite covolume Fuchsian group. To do so, we needed the
fact that the geodesic flow on the associated quotient was mixing with respect to the Liouville
measure. That suggests that if we want to get asymptotic counting for the orbit of an infinite
covolume Fuchsian group, we might want to show mixing of the geodesic flow for the associated
quotient space. However, proving mixing with respect to the Liouville measure is pointless, since
the Liouville measure is infinite. What we need therefore, is a some other measure on the unit tangent
bundle that is invariant under and mixing with respect to the geodesic flow.</p>
<p>For a class of Fuchsian groups known as <a href="https://en.wikipedia.org/wiki/Geometric_finiteness">geometrically finite
groups</a>, such a measure does exist: it's called
the <a href="http://www.scholarpedia.org/article/Bowen-Margulis_measure">Bowen-Margulis measure</a>. The construction
of this measure is fairly involved, and will take up several blog posts, but the intermediate constructions
are interesting in their own right. We will start off by looking at a family of measures associated to a
geometrically finite group $\Gamma$ on the limit set $\Lambda(\Gamma)$, which is a subset of the boundary
$\partial \mathbb{H}^2$. These measures are called the Patterson-Sullivan measures.</p>
<p>Before we describe how the Patterson-Sullivan measures on the boundary are constructed, it will be
useful to make explicit what properties we would like for them to have. We want this measure to
respect the dimension of the limit set, at least infinitesimally. What that means is that if there's
set of radius $r$ which has measure $\mu$, and if it's size is doubled to $2r$, then it's measure
should becomes $2^\delta$ times the original measure, where $\delta$ in some sense is the dimension
of the limit set. In other words, whatever measure we construct on the limit set has to be
compatible with the metric on the limit set.</p>
<p>Note however that the boundary $\partial \mathbb{H}^2$ (and the limit set $\Lambda(\Gamma)$ by extension),
doesn't have a canonical metric. Instead, it has a family of metrics, all conformal to each other,
and every one of those metrics come from a point $x \in \mathbb{H}^2$. The metric associated to each point
$x$ comes from the identification of the unit tangent sphere at that point to the boundary. Suppose
we did have a family of measures $\mu_x$ one for each point, that satisfied the scaling
property. That would then mean that for any two points $x$ and $y$, $\mu_x$ was absolutely
continuous with respect to $\mu_y$ (and the other way round), and the Radon-Nikodym derivative
$\frac{d\mu_y}{d\mu_x}$ would be
$\left( \frac{g_y}{g_x} \right)^{\delta}$, where $g_y$ and $g_x$ are the associated
metrics. For negatively curved spaces, the ratio $\left( \frac{g_y}{g_x} \right)$
at a point $\xi$ in the boundary is actually the Busemann function $e^{\beta_{\xi}(y,x)}$. This lets us
define what we want our family of measures to be in a more precise manner.</p>
<p><strong>Definition</strong>: A conformal density of dimension $\delta$ on
$\partial \mathbb{H}^2$ is a family of Radon measures ${\mu_x}<em _xi="\xi">{x \in \mathbb{H}^2}$ such that all of them lie in
the same measure class, and their Radon-Nikodym derivatives satisfy the following property.
\begin{equation}
\label{eq:1}
\frac{d\mu_y}{d\mu_x}(\xi) = \left(e^{\beta</em>(y,x)} \right)^\delta
\end{equation}</p>
<p>Examples of conformal densities aren't too hard to construct. For instance, the Hausdorff $\delta$
measure on the limit set $\Lambda(\Gamma)$ is a conformal density of dimension $\delta$, where
$\delta$ is the Hausdorff dimension of the limit set. However, this is not the only property we want
our family of measures to satisfy. We want this family of measures to also capture which part
of the boundary do the orbit points go to. That means our family of measures should be equivariant
with respect to the $\Gamma$ action, i.e. the pushforward of $\mu_x$ under $\gamma \in \Gamma$ should be
the same as the measure $\mu_{\gamma x}$. If a conformal density satisfies this property, we say that
conformal density is invariant under $\Gamma$.</p>
<p><strong>Definition</strong>: A conformal density ${\mu_x}<em _ast="\ast">{x \in \mathbb{H}^2}$ is said to be $\Gamma$-invariant if
the following identity holds for all $x \in \mathbb{H}^2$, and all $\gamma \in \Gamma$.
\begin{equation}
\label{eq:2}
\gamma</em> \mu_x = \mu_{\gamma x}
\end{equation}
Furthermore, we will also assume
that all these measures are supported on $\Lambda(\Gamma)$.</p>
<p>It turns the definition of $\Gamma$-invariant conformal density is restrictive enough to let us
deduce many properties of the measure without actually explicitly working with such a measure. For
instance, one can deduce from the definition that such a family of measures is quasi-ergodic with
respect to the $\Gamma$ action on $\partial \mathbb{H}^2$. However, before we start proving such results,
we should verify that $\Gamma$-invariant conformal densities actually exist, otherwise, any theorem
we prove about $\Gamma$-invariant conformal densities will have no content.</p>
<p>The construction we will show first appeared in a paper by S.J. Patterson<sup id="fnref:paper1"><a class="footnote-ref" href="#fn:paper1">1</a></sup>, and was
later generalized to all hyperbolic spaces in by Dennis Sullivan<sup id="fnref:paper2"><a class="footnote-ref" href="#fn:paper2">2</a></sup>.
The idea behind the construction of the Patterson-Sullivan measures is quite clever (and hard to
come up with), but easy to understand. One starts off with a family of measures $\mu_{x}^s$, for
every $x \in \mathbb{H}^2$, and where $s$ is some large enough real number that makes the infinite sum
converge (assuming it exists).
\begin{equation}
\mu_x^s = \frac{\sum_{\gamma \in \Gamma} e^{-s \cdot d(x, \gamma x)} \delta_{\gamma x_0} }{\sum_{\gamma \in \Gamma} e^{-s \cdot d(x, \gamma x)} }
\end{equation}
For large enough $s$, the infinite series in the denominator converges, giving us a probability
measure on $\mathbb{H}^2$ (and not $\partial \mathbb{H}^2$). Also, it turns out that this choice of
normalization isn't quite right: the family ${\mu_{x}^s}<em _Gamma="\Gamma" _gamma="\gamma" _in="\in">{x \in \mathbb{H}^2}$ doesn't quite transform
like \eqref{eq:1} and \eqref{eq:2}, but it's close. The correct normalization as it turns out, comes
from picking a fixed point $x_0 \in \mathbb{H}^2$, and modifying $\mu_x^s$ in the following manner.
\begin{equation}
\mu_x^s = \frac{\sum</em> e^{-s \cdot d(x, \gamma x_0)} \delta_{\gamma x} }{\sum_{\gamma \in \Gamma} e^{-s \cdot d(x_0, \gamma x_0)} }
\end{equation}</p>
<p>This family of measures is no longer a probability measure, unless $x = x_0$, but for a fixed $x$, it's still a finite
measure. In fact, the total mass of $\mu_x^s$ satisfies the following inequality.
\begin{equation}
e^{-s \cdot d(x, x_0)} \leq \lVert{\mu_x^s}\rVert \leq e^{s \cdot d(x, x_0)}
\end{equation}</p>
<p>The next thing we do is observe that for large enough $s$, the infinite sum
$g(s) := \sum_{\gamma \in \Gamma} e^{-s \cdot d(x_0, \gamma x_0)}$ does actually converge. The
easy way to see it is to get an asymptotic upper bound on how many points of $\Gamma x_0$ lie in a
ball radius $r$ around $x_0$. An upper bound that works for us is $c e^{r}$,
which means that an $s \geq n-1$ will make the infinite series converge.</p>
<p>On the other hand, for small enough $s$, the series $g(s)$ will diverge. Which means there is a
critical exponent $\delta$, depending on the group $\Gamma$ such for all $s > \delta$, $g(s)$
converges, and for all $s < \delta$, $g(s)$ diverges. Suppose $g(\delta)$ actually diverges. Then we
could pick a sequence ${s_i}$, going down to $\delta$, and look at the sequence of measures
$\mu_x^{s_i}$. Since these can be thought of as measures on $\overline{\mathbb{H}^2}$, which is compact,
and the mass of all these measures is bounded by inequality <em>(5)</em>, one can pick out a
convergent subsequence. Since we assumed $g(\delta)$ actually diverged, the limit measure $\mu_x$
will actually be supported on the limit set $\Lambda(\Gamma)$, and satisfy the properties we want
it to satisfy (we will verify that).</p>
<p>This is the rough idea behind the Patterson-Sullivan construction. However, there are many questions
that one needs to answer.</p>
<ol>
<li>What does one do if $g(\delta)$ does not diverge?</li>
<li>How does this measure depend on the choice of ${s_i}$?</li>
<li>Does it depend on the choice of basepoint $x_0$?</li>
</ol>
<p>The first problem was dealt with by Patterson in Lemma 3.1 of his paper. The idea is
to attach additional weights to $e^{-s \cdot d(x,\gamma x_0)}$ by multiplying it with
$h(d(x, \gamma x_0))$, where $h$ is a function from $\mathbb{R}<em>+$ to $\mathbb{R}</em>+$, which grows slowly, i.e. it
makes the infinite series diverge for exponent $\delta$, but still converge for larger exponents. We
also require that for any $d$, $\lim_{r \to \infty} \frac{h(r + d)}{h(r)} = 1$. The last
condition ensures that this modified measure transforms like the original measure in the limit.<sup id="fnref2:paper1"><a class="footnote-ref" href="#fn:paper1">1</a></sup></p>
<p>The second and the third questions can be answered by proving a rather general theorem about the uniqueness
of the constructed measures.</p>
<p><strong>Theorem</strong> (Uniqueness): Let ${\nu_x}<em _in="\in" _mathbb_H="\mathbb{H" x>{x \in \mathbb{H}^2}$ and ${\mu_x}</em>^2}$ be two $\Gamma$-invariant conformal densities of the same dimension
$\delta$. Then ${\mu_x}<em _in="\in" _mathbb_H="\mathbb{H" x>{x \in \mathbb{H}^2}$ and ${\nu_x}</em>^2}$ are the same up to
scaling.</p>
<p>This statement of the above theorem is equivalent to the following statement.</p>
<p><strong>Theorem</strong> (Quasi-ergodicity): Let $\nu$ be a $\Gamma$-invariant conformal density. Then the $\Gamma$ action on the boundary $\partial \mathbb{H}^2$ is
quasi-ergodic with respect to $\nu$, i.e. any $\Gamma$-invariant sets have full measure
or zero measure.</p>
<p>The equivalence of these two statements is straightforward to see. Suppose one had a non-trivial
invariant set $U$. One could then condition the measure on $U$ to get a $\Gamma$-invariant conformal
density that wasn't a scalar multiple of the original $\Gamma$-invariant conformal density. That
would contradict uniqueness. On the other hand, if one had two $\Gamma$-invariant conformal
densities, their Radon-Nikodym derivative would be a $\Gamma$ invariant function, and hence constant
almost everywhere.</p>
<p>Although in principle, we can prove quasi-ergodicity using just the results we have now, the
appropriate place for its proof is the post on the ergodicity of the geodesic flow. One may worry
that in the process of doing so, its proof may become circular, i.e. it may use uniqueness. The fact
that this does not happen can be verified skipping ahead to the relevant section.<sup id="fnref:footnote3"><a class="footnote-ref" href="#fn:footnote3">3</a></sup></p>
<p>Once the foundational question has been dealt with, one is faced a (relatively) more mundane
problem: how does one actually measure the Patterson-Sullivan measure of a given set, or at least,
get a good estimate. To this end, we have Sullivan's shadow lemma. Before we state the lemma, we
need to define what a shadow is.</p>
<p><strong>Definition</strong> (Shadow): For any points $x$ in $\overline{\mathbb{H}^2}$ and $y$ in $\mathbb{H}^2$, and any $r
\geq 0$, the shadow $\mathcal{O}_r(x, y)$ is the set of all points $\xi$ in $\partial \mathbb{H}^2$ such
that the geodesic from $x$ to $\xi$ intersects a ball of radius $r$ around $y$.</p>
<table class="image">
<caption align="bottom">Figure 1: The point ξ lies in the shadow cast by a ball around y and a light
source at x. Image taken from
<a href="https://www.math.u-bordeaux.fr/~jquint/publications/courszurich.pdf">(a)n overview of Patterson-Sullivan theory</a> by J.F. Quint. </caption>
<tr><td><img src="../images/orbit-points-2/shadow1.png" width="100%" height="auto" class="center"/></td></tr>
</table>
<p><strong>Lemma</strong> (Sullivan's shadow lemma): Let $\Gamma$ be a non-elementary discrete group, and let ${\mu_x}<em x>{x \in \mathbb{H}^2}$ be a $\Gamma$-invariant conformal density of
dimension $\delta$. Then for any $x$ there exists a large enough $r_0$, such that for all
$r \geq r_0$, there exists a $C > 0$, depending on $r$ for which the following inequality holds
for all $\gamma \in \Gamma$.
\begin{equation}
\frac{1}{C} e^{-\delta \cdot d(x, \gamma x)} \leq \mu</em>\left( \mathcal{O}_r(x, \gamma x) \right) \leq C e^{-\delta \cdot d(x, \gamma x)}
\end{equation}</p>
<p><strong>Proof</strong>: The first thing we need to show is that there exists an $r$ large enough, and an $\varepsilon$ small
enough such that the following inequality holds for all $y \in \overline{\Omega}$.
\begin{equation}
\label{eq:5}
1 \geq \mu_x\left( \mathcal{O}_r(y, x) \right) \geq \varepsilon
\end{equation}
As $r$ gets larger, the set $ \mathcal{O}_r(y, x)$ gets larger as well, and contains everything
but the endpoint of the geodesic from $x$ to $y$. That means the only way this inequality can fail
to hold is if all the mass of $\mu_x$ was concentrated at that point. But since $\Gamma$ is
non-elementary, the measure $\mu_x$ cannot be supported on a single point.</p>
<table class="image">
<caption align="bottom">Figure 2: A large ball around x casts a very large shadow. Image taken from
<a href="https://www.math.u-bordeaux.fr/~jquint/publications/courszurich.pdf">(a)n overview of Patterson-Sullivan theory</a> by J.F. Quint. </caption>
<tr><td><img src="../images/orbit-points-2/shadow2.png" width="100%" height="auto" class="center"/></td></tr>
</table>
<p>We now rewrite $\mu_x\left( \mathcal{O}<em _mathcal_O="\mathcal{O">r(x, \gamma x) \right)$ using identities \eqref{eq:1} and \eqref{eq:2}.
\begin{equation}
\label{eq:6}
\mu_x\left( \mathcal{O}_r(x, \gamma x) \right) = \int</em><em _xi="\xi">r(\gamma^{-1}x, x)} e^{-\delta \cdot \beta</em>(\gamma^{-1}x, x)} d\mu_x(\xi)
\end{equation}
Since we already know $ \mu_x\left( \mathcal{O}<em _xi="\xi">r(\gamma^{-1}x, x) \right)$ is between $1$ and $\varepsilon$,
all we need to do is estimate the Busemann function on the shadow. We can do that using the triangle
inequality, which gives us the following inequality.
\begin{equation}
\label{eq:7}
d(\gamma^{-1}x, x) - 2r \leq \beta</em>(\gamma^{-1}x, x) \leq d(\gamma^{-1}x, x)
\end{equation}
Combining \eqref{eq:5}, \eqref{eq:6}, and \eqref{eq:7}, we get the inequality we need. $\square$</p>
<p>All this may seem like it has noting to do with counting, but in fact, we already have developed enough tools
to get a rough asymptotic count of the orbit points for convex cocompact groups, but that will have to wait for
the next post because I think this post is long enough already.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:paper1">
<p><a href="https://projecteuclid.org/euclid.acta/1485889894">The limit set of a Fuchsian group</a>, S.J. Patterson <a class="footnote-backref" href="#fnref:paper1" title="Jump back to footnote 1 in the text">↩</a><a class="footnote-backref" href="#fnref2:paper1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:paper2">
<p><a href="https://doi.org/10.1007/BF02684773">The density at infinity of a discrete group of hyperbolic motions</a>, Dennis Sullivan <a class="footnote-backref" href="#fnref:paper2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:footnote3">
<p>The proof goes via the proof of the fact that the Bowen-Margulis measure is an ergodic
measure for the geodesic flow. At this point, it's natural to ask why is the easiest proof of
ergodicity of the $\Gamma$-action via the ergodicity of the geodesic flow. A somewhat
unsatisfactory answer to that is that the geodesic flow and the Bowen-Margulis measure have
structures that are compatible with each other: the geodesic flow has stable and unstable
manifolds, and the Bowen-Margulis measure happens to decompose nicely along these foliations. <a class="footnote-backref" href="#fnref:footnote3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>Counting orbit points under group actions - Part 12019-10-19T00:00:00-04:002019-10-19T00:00:00-04:00Sayantan Khantag:None,2019-10-19:/orbit-points-1.0.html<p>After 10 months of being unable to come up with anything interesting to post on the blog, I realized
it might be a good idea use this blog to keep track of the math I've been working on. That way my blog
can act as a public version of my …</p><p>After 10 months of being unable to come up with anything interesting to post on the blog, I realized
it might be a good idea use this blog to keep track of the math I've been working on. That way my blog
can act as a public version of my research notebook, and be a good place to relearn this material if I
happen to forget the details in the future.</p>
<p>I have been learning some Patterson-Sullivan theory for the past few weeks with the goal of using the
theory to solve some counting problems. To be more precise, I'm working with the following setup.</p>
<ul>
<li>$X$ is an unbounded metric space (in practice, $X$ is usually $\mathbb{R}^n$, $\mathbb{H}^n$, or
$\mathcal{T}_{g}$ (the Teichmüller space of genus $g$ surfaces)).</li>
<li>$\Gamma$ is a discrete group acting on $X$ via isometries.</li>
</ul>
<p>Given a point $x \in X$, and a radius $r$, I'm interested in knowing how many points of the orbit of
$x$ under $\Gamma$, which we'll denote $\Gamma x$, lie in a ball of radius $r$ about $x$ (which
we'll denote by $B(x, r)$). Since we're dealing with a discrete group, the number of orbit points
clearly won't be a continuous function of $r$, which means we can't expect a nice exact answer, but
we can ask questions about the asymptotics of the orbit counting function.</p>
<p>It's instructive to start off with the simplest example: our metric space is $\mathbb{R}^2$, and our
discrete group $\Gamma$ is just the group generated by translations by $(0,1)$ and $(1,0)$. If we
pick $x$ to be the point $(0.5,0.5)$ (we're picking this point so our pictures look nicer), our
orbit counting question translates to the following question: How many half-integer points does a ball of
radius $r$ contain? A way to answer this question is to think of the fundamental domain of the group
$\Gamma$. If the ball contains $n$ points, we might expect it to contain $n$ copies of the fundamental domain,
which means we'd expect it have $n$ times the area of a fundamental domain. That's not quite correct, as the following
picture demonstrates.</p>
<table class="image">
<caption align="bottom">Figure 1: The black orbit points have their fundamental domains completely in the ball, whereas the red ones don't. The ball also
has parts of fundamental domains for which it doesn't contain the corresponding orbit point.</caption>
<tr><td><img src="../images/orbit-points-1/fig1.png" width="70%" height="auto" class="center"/></td></tr>
</table>
<p>However, since the fundamental domain has diameter bounded by $1$, we can expand and contract our ball slightly to get
upper and lower bounds in terms of area. If we denote the number of orbit points in a ball of radius $r$ as $C(r)$,
we get the following inequality for $C(r)$.
$$
\frac{\text{Area}(B(x, r-1))}{\text{Area}(\text{Fundamental domain})} \leq C(r) \leq \frac{\text{Area}(B(x, r+1))}{\text{Area}(\text{Fundamental domain})}
$$</p>
<p>Note now for large values of $r$, $\text{Area}(x, r-1)$ is approximately equal to $\text{Area}(x,
r+1)$, i.e. their ratio approaches $0$. That means for large values of $r$, $C(r)$ behaves like
$\frac{\text{Area}(B(x, r+1))}{\text{Area}(\text{Fundamental domain})}$. This only works out because
$\frac{(r+1)^2}{(r-1)^2}$ approaches $1$ as $r$ gets larger.</p>
<p>The result in the case of $\mathbb{R}^n$ suggests what the answer should be even in the more general
cases, at least when the discrete group $\Gamma$ has a fundamental domain with finite volume,
i.e. when $\Gamma$ is a <a href="https://en.wikipedia.org/wiki/Lattice_(discrete_subgroup)">lattice</a>. In the
case when one is working in $\mathbb{H}^n$, the asymptotics are the same. $$ C(r) \sim
\frac{\text{Area}(B(x, r))}{\text{Area}(\text{Fundamental domain})} $$ One might try the same proof
as before, and it almost works except that it fails at a rather critical step. Recall that the
volume of a ball of radius $r$ in $\mathbb{H}^n$ is not polynomial in $r$, but rather exponential.
To be more specific, the volume is proportional to $e^r$, which means the following ratio
does not converge to $1$.
$$
\frac{\text{Area}(x, r+1)}{\text{Area}(x, r-1)}
$$
To be more precise, the thing that can go wrong is the following: by estimating the number of lattice points
as $c \cdot \text{Area}(B(x,r))$ (where $c = \frac{1}{\text{Area}(\text{Fundamental domain})}$), we may
drastically over count or under count, as the following pictures illustrate.</p>
<table class="image">
<caption align="bottom">Figure 2: This is an underestimate because even though the blue orbit points are in the ball,
most of their fundamental domains aren't, which leads to an underestimate by the area calculation/</caption>
<tr><td><img src="../images/orbit-points-1/under_estimate.png" width="70%" height="auto" class="center"/></td></tr>
</table>
<table class="image">
<caption align="bottom">Figure 3: This is an overestimate because even though the blue orbit points are outside the ball,
most of their fundamental domains are in the ball, which leads to an overestimate by the area calculation/</caption>
<tr><td><img src="../images/orbit-points-1/over_estimate.png" width="70%" height="auto" class="center"/></td></tr>
</table>
<p>However, these are the worst case scenarios. In a more realistic scenario, one might expect the
overestimates and underestimates to cancel out as the ball gets larger and larger. One might expect
that to happen if the direction at which the boundary of the ball approaches translates of the
fundamental domain is approximately random, and independent of the directions it approaches other
translates by. That turns out to be true, and is a consequence of the fact that the geodesic flow on
hyperbolic surfaces of finite volume is
<a href="https://en.wikipedia.org/wiki/Mixing_(mathematics)#Mixing_in_dynamical_systems">mixing</a>.
This intuition is formalized in the following lemma from a paper by Eskin and McMullen<sup id="fnref:paper1"><a class="footnote-ref" href="#fn:paper1">1</a></sup>.</p>
<p><strong>Lemma:</strong> Let $X = \Gamma/\mathbb{H}^2$ be a finite volume hyperbolic surface, and $S(p, r)$ be the image
of a sphere of radius $r$ on $\mathbb{H}^2$ with centre a lift of a point $p$ in $\Gamma/\mathbb{H}^2$.
Then for any compactly supported continuous function $\alpha$, the average of $\alpha$ over $S(p, r)$ (with
respect to the Lebesgue measure on $S^1$) approaches the average of $\alpha$ over $X$ as
$r$ approaches $\infty$.</p>
<p><strong>Proof:</strong> To prove this result, we need to lift the function $\alpha$ to the unit tangent bundle
$T^1X$, and we do that in the obvious way, i.e. ${\alpha}(x, v) = \alpha(x)$, for any unit tangent
vector $v$ over $x$. Also, let $K$ be the set of all vectors over $p$. Then the image of $K$ under
the geodesic flow $g_t$ is the collection of vectors over $S(p, t)$ pointing normally outwards. In
particular, the average of $\alpha$ over $S(p, t)$ is the same as the average of $\alpha$ over
$g_t(K)$<sup id="fnref:observation"><a class="footnote-ref" href="#fn:observation">2</a></sup>. Furthermore, $g_t(K)$ can be approximated arbitrarily well by $g_t(U)$ where
$U$ is the collection of vectors over a small neighbourhood of $p$. Because the geodesic flow is
mixing, the average of $\alpha$ over $g_t(U)$ approaches the average of $\alpha$ over $S^1X$ (and
equivalently, $X$), as $t$ approaches $\infty$. This proves the result. $\square$</p>
<p>Once we have the lemma, the rest is smooth sailing. We let $\alpha$ be a bump function supported in
a small neighbourhood of $p$, and lift $\alpha$ to a function $\widetilde{\alpha}$ on the universal
cover. To count the number of orbit points in a ball of radius $r$, we just integrate
$\widetilde{\alpha}$ on that ball. We can approximate this integral for large $r$ using the lemma we
just proved, and that gives us the estimate we want, namely that the number of orbit points in a
ball of radius $r$ is asymptotically $\frac{\text{Area}(B(p, r))}{\text{Area}(\text{Fundamental
Domain})}$.</p>
<p>One of the key elements of this proof was the fact that the geodesic flow is mixing with respect to
a suitable measure. That suggests how one might extend this idea to other contexts. One important
generalization of cocompact lattices are convex cocompact groups<sup id="fnref:convex"><a class="footnote-ref" href="#fn:convex">3</a></sup>. And for these groups, the
estimates we have here don't work verbatim because the fundamental domain for a convex cocompact
group may not have finite area. Secondly, the lemma as stated won't work either because the
Liouville measure is not the "correct" measure on the unit tangent bundle. Not only is it not
mixing, it's not even preserved by geodesic flow. In fact, the geodesic flow makes all the
mass of Liouville measure escape the convex core, which means that one cannot do dynamics on a convex cocompact
group using the Liouville measure. The right measure for this context turns out to be the Bowen-Margulis measure,
which is constructed using the Patterson-Sullivan measure on the boundary, and that is what I will write about in
my next blog post.</p>
<p>UPDATE: <a href="/orbit-points-2.0.html">Part 2</a> of this series is up.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:paper1">
<p><a href="https://projecteuclid.org/euclid.dmj/1077289841">Mixing, counting, and equidistribution in Lie groups</a>, Alex Eskin and Curt McMullen. <a class="footnote-backref" href="#fnref:paper1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:observation">
<p>The fact is fairly obvious in this case because the measure on $S^1X$ is the
Liouville measure, which is locally a product of the Riemannian measure on the base, and
Lebesgue measure on the fibre. However, this fact is no longer obvious if one is dealing with
other measures like the Bowen-Margulis measure, which may not be uniform in every direction. <a class="footnote-backref" href="#fnref:observation" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:convex">
<p>Convex cocompact groups are discrete groups whose action on the convex hull of its limit set is
cocompact (but its action on the universal cover may not cocompact). <a class="footnote-backref" href="#fnref:convex" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
</ol>
</div>What is an "a priori estimate"?2018-12-18T00:00:00-05:002018-12-18T00:00:00-05:00Sayantan Khantag:None,2018-12-18:/a-priori-estimates.html<p>One of the things a math major learns in their first proof based course is that one must
prove existence of objects before going on to prove any properties about them. After a few years,
this becomes almost second nature, and most pure mathematicians are wary of making claims about …</p><p>One of the things a math major learns in their first proof based course is that one must
prove existence of objects before going on to prove any properties about them. After a few years,
this becomes almost second nature, and most pure mathematicians are wary of making claims about any
object without first proving its existence. In practice, physicists and applied mathematicians
aren't held back by such restrictions: they often assume the existence of, say, a solution to a PDE,
and prove results about the solutions, e.g. prove bounds on the size, smoothness, etc. and it often
turns out that the solutions do exist, and satisfy those properties. A priori estimates are the
bounds on the solution one gets before one actually knows the solution exists. Under certain conditions,
one can then use these bounds on the solution to show that a solution exists. Informally, the chain of
logic looks something like the following.
\begin{align<em>}
(\text{Solution exists} \implies \text{Solution bounded by } C ) \implies \text{Solution exists}
\end{align</em>}
The first implication is the a priori estimate, and the second implication follows from a general
fixed point theorem. We'll see how it works out in practice, but before we see a real example, we'll
see a pseudo-example from algebra which looks similar to an a priori estimate.</p>
<h3>Normality is local</h3>
<p>Let $R$ be a normal ring. This means that $R$ is an integral domain such that for any $\theta \in \text{Frac}(R)$,
if $\theta$ is a root of a monic polynomial in $R[x]$, then $\theta$ actually belongs to $R$. We want to prove
the following result.</p>
<p><strong>Theorem:</strong> If $R$ is a normal ring, and $f$ is a non-zero element of $R$, then the localization $f^{-1}R$
is also normal.</p>
<p>Suppose $\theta$ is some element in $\text{Frac}(f^{-1}R)$ which satisfies a monic polynomial
in $f^{-1}R[x]$. Notice that $\text{Frac}(f^{-1}R) = \text{Frac}(R)$, so $\theta$ can be thought
of belonging to $\text{Frac}(R)$. Elements of $f^{-1}R$ are of the form $\frac{r}{f^k}$, where $r \in R$,
and we can also assume $k$ is the minimal exponent, i.e. $r$ is not a multiple of $f$ in $R$. One can think
of the exponent $k$ as a kind of norm on elements of $f^{-1}R$. Let $\theta$ be a root of the following
monic polynomial.
\begin{align<em>}
x^n + \frac{r_{n-1}}{f^{k_{n-1}}}x^{n-1} + \cdots + \frac{r_{1}}{f^{k_{1}}}x + \frac{r_{0}}{f^{k_{0}}}
\end{align</em>}
We need to show that this equation has a solution in $f^{-1}R$, in particular, $\theta \in f^{-1}$.
In the spirit of a priori estimates, let's assume that $\theta$ is a solution that actually lies
in $f^{-1}R$. We'll try to get a bound on the norm of $\theta$, i.e. an a priori estimate. Let $\theta$
be of the form $\frac{\alpha}{f^k}$. We want to plug $\frac{\alpha}{f^k}$ into the given polynomial and get
a monic polynomial in $R[\alpha]$, since $R$ is normal. If we plug in $\frac{\alpha}{f^k}$, we get the following.
\begin{align<em>}
\frac{1}{f^{nk}}\alpha^{n} + \frac{r_{n-1}}{f^{k_{n-1} +nk - k}} \alpha^{n-1} +
\frac{r_{n-2}}{f^{k_{n-2} +nk - 2k}} \alpha^{n-2} + \cdots + \frac{r_{1}}{f^{k_{1} +nk - (n-1)k}} \alpha^{1} +
\frac{r_0}{f^{k_0}} = 0
\end{align</em>}
If we want to clear denominators, and still retain a monic polynomial, our $k$ must satisfy the following inequalities
for all $1 \leq j \leq n$.
\begin{align<em>}
k \geq \frac{k_{n-j}}{j}
\end{align</em>}
These inequalities are our a priori estimates. We assumed that the solution existed, and we then bounded its norm below
by a fixed quantity. Now we need to show that this implies a solution exists. Notice that after clearing the denominator,
we get a monic polynomial in $R[x]$ which is satisfies by $\alpha = f^k \theta$ which belongs to $\text{Frac}(R)$. Since
$R$ is normal, $\alpha \in R$, and hence $\theta = \frac{\alpha}{f^k}$ is in $f^{-1}R$.</p>
<p>This pseudo-example served two purposes: the first one was elucidate the key idea behind using a priori estimates without
getting bogged into analytic technicalities. The second purpose was to illustrate that the idea of boundedness implying
existence might be a very broad principle that applies in a lot of contexts. Now we'll see an actual example of
using an priori estimate. This is Example 2 from Chapter 9 of Evans' <em>Partial Differential Equations</em>.</p>
<h3>A quasilinear elliptic PDE</h3>
<p>Consider a region $U$ in $\mathbb{R}^n$ with a smooth boundary $\partial U$. We want to solve
the following PDE.
\begin{align<em>}
- \Delta u + \mu u = -b(Du)
\end{align</em>}
Additionally, we have that $u$ must be $0$ on the
boundary. In the above equation, $\Delta$ is the Laplacian, $\mu$ some number greater than $0$, $b$
is a Lipschitz function, and $D$ is a first order partial differential operator. We want to show
that for a large enough $\mu$ there exists a solution $u \in H^2(U) \cap H^1_0(U)$, where $H^2$ and
$H^1_0$ are the appropriate Sobolev spaces (if you don't know what Sobolev spaces are, think of
these spaces as functions whose derivatives are bounded).</p>
<p>For any $w \in H^1_0(U)$, define $f_w$ to be the function $-b(Dw)$. Since $b$ is Lipschitz, $f_w$ is in $L^2$.
If we plug in $f_w$ instead of $-b(Du)$ in the PDE, we have the following linear PDE (after fixing $w$).
\begin{align<em>}
- \Delta u + \mu u = f_w
\end{align</em>}
By general PDE theory, linear elliptic PDEs have unique
solutions. Let the solution we get be denoted by $u_w$. By more general nonsense (i.e. elliptic
regularity and similar statements), we have that $u_w$ is in $H^2(U)$. We thus have a map that
takes $w \in H^1_0(U)$ to a $u_w$ in $H^2$. This can also be thought of as a non-linear continuous
map from $H^1_0(U)$ to $H^1_0(U)$. We call this map $A$, and we can bound the $H^2$ norm of $A$. The
Rellich lemma tells us that $A$ takes bounded sets to pre-compact sets in $H^1_0$. After some more
analysis, which we'll skip, we can show that for a large enough $\mu$, the following set is bounded.
\begin{align<em>}
SFP = { w \in H^1_0(U)\ |\ w = \lambda A(w) \text{ for some $\lambda \in [0,1]$} }
\end{align</em>}
Observe that any $w$ such that $w = A(w)$ will be a solution of our original PDE. That means
we would like to show that $A$ has fixed points. What we have from our analysis so far is
that the set of scaled fixed points $SFP$ is bounded. This is our a priori estimate. The second
step is applying Schaefer's fixed point theorem to the map $A$, which requires the set $SFP$ be bounded,
and gives us that the map $A$ has a fixed point, and hence our original PDE has a solution.</p>
<p>In both the examples, the idea of the proof was guided by the following principle: pretend a solution exists to get
bounds, and after obtaining those bounds, show that a solution actually exists.</p>The most overloaded word in math2018-10-14T00:00:00-04:002018-10-14T00:00:00-04:00Sayantan Khantag:None,2018-10-14:/overloaded-word.html<p>Last Wednesday, the conversation in my office veered towards the words we hated the most in
math. Not surprisingly, the list included the usual suspects like <em>normal</em>, <em>simple</em>, and <em>regular</em>.
It's probably the same reason that these words also make it to the top five of <a href="https://mathoverflow.net/questions/7389/what-are-the-most-overloaded-words-in-mathematics">this MathOverflow
post</a>. These …</p><p>Last Wednesday, the conversation in my office veered towards the words we hated the most in
math. Not surprisingly, the list included the usual suspects like <em>normal</em>, <em>simple</em>, and <em>regular</em>.
It's probably the same reason that these words also make it to the top five of <a href="https://mathoverflow.net/questions/7389/what-are-the-most-overloaded-words-in-mathematics">this MathOverflow
post</a>. These
words are overloaded to the point of meaning something different to mathematicians working in
different areas of math. On the other hand, we all agreed that some overloading of words was
actually fairly useful: for instance, it makes sense to call a normal covering space normal since it
actually corresponds to a normal subgroup of the fundamental group. That means calling a cover
normal and calling a subgroup normal isn't really a bad thing, since it shows that those two notions
are related.</p>
<p>We thought we'd do something similar for all the possible meanings of the word normal: we'd define
an equivalence relation between two different meanings if there is some result, deep or otherwise,
that links the two notions. Then the number of equivalence classes we get would be a much better
metric of the overloaded-ness of the word <em>normal</em>.</p>
<p>Here's a list of the meanings of <em>normal</em> (taken from Wikipedia), along with some additions.</p>
<ul>
<li><strong>Normal subgroup</strong>: A subgroup which is invariant under conjugation action.</li>
<li><strong>Normal cover</strong>: A covering space whose deck transformation group acts transitively.</li>
<li><strong>Normal field extension</strong>: A field extension such that every irreducible polynomial in the base
field splits into linear factor, or is irreducible.</li>
</ul>
<p>These three uses of normal are really the same, since they all talk about an associated subgroup
being a normal subgroup. In the case of the normal cover, the fundamental group of the cover is a
normal subgroup of the fundamental group of the base space. When it comes to field extensions,
consider the following field extension: $E \subset F \subset G$, where $G$ is Galois over $E$. In
this case, $F$ is a normal extension iff it corresponds to a normal subgroup of $\mathrm{Gal}(E/G)$.</p>
<p>A few different meanings of the word normal(ize) show up often in algebraic geometry.</p>
<ul>
<li><strong>Normal domain</strong>: An integral domain which is integrally closed in its field of fractions.</li>
<li><strong>Normal varieties</strong>: A variety $X$ such that any finite birational map from any variety $Y$ to
$X$ is an isomorphism.</li>
<li><strong>Noether normalization</strong>: The Noether normalization lemma states that for any finitely generated
$k$-algebra $A$, there exist ${y_1, \ldots, y_d} \in A$ such that $A$ is a finitely generated
module over $k[y_1, \ldots, y_d]$.</li>
</ul>
<p>These seemingly different notions actually are somewhat equivalent. As it turns out, a variety is
normal if the local ring at every point is integrally closed. And while normal varieties are
varieties which have maps from "nice" varieties, a geometric interpretation of Noether normalization
is that every $d$-dimensional affine variety is a ramified cover of $\mathbb{A}^d$, which is a "nice"
variety.</p>
<p>Another meaning of the word normal comes from the geometric notion of being perpendicular. This
gives us a lot of different meanings of the word normal which we can collapse to one equivalence
class.</p>
<ul>
<li><strong>Normal bundle</strong>: The normal bundle of an embedded submanifold is the vector bundle such that the
fibre over each point consists of vectors perpendicular to the tangent space.</li>
<li><strong>Normal coordinates</strong>: Given a vector bundle with an affine connection, the normal coordinates
around a point are coordinates such that the Christoffel symbols of the connection vanish at the
point.</li>
<li><strong>(Ortho)normal basis</strong>: A basis of an inner product space such that each vector is of norm $1$ and
every pair is perpendicular.</li>
<li><strong>Normal operator</strong>: An operator which commutes with its Hermitian conjugate.</li>
<li><strong>Normal modes</strong>: (Taken from wikipedia) A normal mode of an oscillating system is a pattern of
motion in which all parts of the system move sinusoidally with the same frequency and with a fixed
phase relation.</li>
</ul>
<p>Normal bundle literally comes from the original meaning of the word normal in the sense of being
perpendicular. Normal coordinates also come from the same source: in the case of the Levi-Civita
connection, one gets a set of normal coordinates by applying the exponential map to an orthonormal
basis of the tangent space.</p>
<p>The reason why a normal operator is called a normal operator is that we know from the spectral
theorem that its eigenvalues form an orthonormal basis. That is also the source of normal modes. The
"normal" in normal modes comes from the fact that the vibrations are the eigenvectors of a certain
differential operator, which happens to be self-adjoint, hence normal. That means all the normal
modes literally form an orthonormal basis of solutions to the associated PDE.</p>
<p>The above words were some of the different cases of the usage of the word <em>normal</em> that we were able
to collapse. And below are the ones we couldn't collapse to anything else, so they sit all by
themselves (for now) in their own equivalence class.</p>
<ul>
<li><strong>Normal family</strong>: A pre-compact family of holomorphic functions.</li>
<li><strong>Normal space</strong>: A topological space which satisfies the $T_4$ axiom. <a href="http://brownsharpie.courtneygibbons.org/comic/i-used-to-confuse-regular-and-normal/">Here's a bad pun</a> involving this.</li>
<li><strong>Normal forms</strong>: These are a whole class of ways to write matrices of linear operators in a nice form,
e.g. Jordan normal form, Smith normal form, etc.</li>
<li><strong>Normal distribution</strong>: The distribution that most sums of random variables converge to, thanks to the
Central limit theorem.</li>
<li><strong>Normal forms part deux</strong>: All the normal forms that crop up in formal language theory and
computability, e.g. conjunctive normal form, disjunctive normal form, Chomsky normal form, etc.</li>
</ul>
<p>We started off with 18 different meanings of the word <em>normal</em>, and now, after constructing the
equivalence relation, we are left with only 8 different equivalence classes (maybe fewer, if someone
discovers some deep result linking normal operators to normal subgroups). That makes one think: maybe
it's not so ab<em>normal</em> for mathematicians to overuse normal after all.</p>An algebraic definition of the cotangent space2018-06-30T00:00:00-04:002018-06-30T00:00:00-04:00Sayantan Khantag:None,2018-06-30:/algebraic-cotangent-space.html<p>I'm almost a week into the algebraic geometry workshop now, and I've learnt a
lot. I've learnt a few things about varieties, and also a bit of commutative
algebra, but the most important takeaway for me from the first week was the
sheaf theoretic way of looking at smooth manifolds …</p><p>I'm almost a week into the algebraic geometry workshop now, and I've learnt a
lot. I've learnt a few things about varieties, and also a bit of commutative
algebra, but the most important takeaway for me from the first week was the
sheaf theoretic way of looking at smooth manifolds (and also algebraic
varieties). I'll talk a bit about this approach of looking at things, and then
outline a definition of cotangent space using this formalism, and see how that
lets us define a cotangent space on algebraic varieties. Finally, we'll have some sanity
checks on the defined cotangent space to see if some properties carry over to
the category of algebraic varieties.</p>
<h2>The sheaf of regular functions on a topological space</h2>
<p>Let's consider a specific example: consider some smooth manifold $M$. For any
open subset $U$, consider the set $\mathcal{O}(U)$ of all smooth functions from
$U$ to $\mathbb{R}$. We have such a collection for each open subset of $M$. But
that's not all - we have relations between the various $\mathcal{O}(U)$. If $U
\subset V$, then there's a natural map from $\mathcal{O}(V)$ to
$\mathcal{O}(U)$, which is the restriction map. Now recall that we are working
with smooth functions: one property of smoothness is that it's a local
property. To put it more concretely, if we have <em>any</em> map from an open set $U$
to $\mathbb{R}$ such that its restriction to each open subset $U_i$ is smooth,
where ${U_i}$ is some collection of open subsets that covers $U$, then we know
that $U$ is actually smooth. Phrasing this in terms of the $\mathcal{O}(U)$
formalism, it just means that if the restriction of a function $f$ to each $U_i$
is contained in $\mathcal{O}(U_i)$, then $f$ is actually in $\mathcal{O}(U)$.</p>
<p>These two properties make the collection ${\mathcal{O}(U)}$ into
a sheaf. Furthermore, the set $\mathcal{O}(U)$ can be given the structure
of an algebra, because the maps are all into a ring, and hence the functions can
be added and multiplied pointwise, and multiplied by elements of the ring
$R$. And if we vary the space $X$, the ring $R$, and the sets $\mathcal{O}(U)$,
we end up getting various familiar spaces and maps. For instance, if we keep $X$
as a manifold, and $R$ as $\mathbb{R}$, but require $\mathcal{O}(U)$ to the
collection of $C^1$ functions, we end up getting a $C^1$ manifold. If we set $X$
as a some affine variety over a field $k$, set $R$ as $k$, and set
$\mathcal{O}(U)$ as the ratio of polynomials where the denominator does not
vanish on the set $U$, we get the regular functions of affine varieties.</p>
<p>Sheaves are a nice way of capturing functions which satisfy some property
<em>locally</em>, whether it is being smooth, being holomorphic, or being ratios of
polynomials.</p>
<h2>Sheaves capture local properties</h2>
<p>Consider a topological space $X$ and the sheaf of regular functions
$\mathcal{O}$ to a field $k$, and some point $p \in X$. With the sheaves, we
have a way of looking at the <em>germ</em> of any function $f$ at the point $p$. It's
clear that the germ of $f$ will capture all the local information of $f$ at that
point. For instance, when the regular functions are smooth functions, the germ
of $f$ will contain the information about all the derivatives of $f$. If the
regular functions are holomorphic functions, the germ contains even <em>global</em>
information about the function because of analytic continuation. The collection
of the germs of all the functions defined around $p$, called the stalk at $p$, is
again a $k$-algebra. It is much more, in fact. It's a local ring (which we'll
call $\mathcal{O}_p$), whose only maximal ideal is the set of germs of all the functions
which vanish at $p$.</p>
<p>Another way of getting local information at the point $p$ is to look at tangent vectors
at the point $p$. What do tangent vectors mean in this context though? A tangent vector
$v$ at $p$ is a $k$-linear map from the set of all functions defined in some open set
around $p$ to $k$ such that for any two functions $f$ and $g$, the
product rule holds.
\begin{align<em>}
v(fg) = f(p) \cdot v(g) + v(f) \cdot g(p)
\end{align</em>}
It's not quite clear from the definition whether the action of the tangent vector
on a function $f$ depends only the germ of $f$. In fact, even the relation between
the tangent space, and the local ring at $p$ is not very clear from this definition.
We need some way to link the two notions. And that's where the cotangent space comes
in. We shall define the cotangent space in purely algebraic terms, i.e. in terms of the
local ring, and then show that the space of tangent vectors $T_p$ is actually the dual of
the cotangent space, thus exhibiting the link between the local ring at $p$, to the
tangent space at $p$.</p>
<h3>Construction of the algebraic cotangent space</h3>
<p>Let $\mathfrak{m}$ be the maximal ideal of $\mathcal{O}_p$. It consists of the
germs of all the functions which vanish at $p$. We define the cotangent space at
$p$ to be the set $\frac{\mathfrak{m}}{\mathfrak{m}^2}$ considered as a $k$
vector space. I won't outline the motivation behind picking this as the
cotangent space, because I myself am not completely sure why, so let's take for
granted that this is a reasonable candidate for the definition of cotangent
space. We need to show that $T_p$ is the dual of the space
$\frac{\mathfrak{m}}{\mathfrak{m}^2}$.</p>
<p>First we'll show every element of $T_p$ is indeed a linear functional acting on
$\frac{\mathfrak{m}}{\mathfrak{m}^2}$. Take any element $v \in T_p$. It acts
on an element $m + \mathfrak{m}^2$, and returns $v(m)$. This map is well defined,
because for any other representative $m'$, $m - m' \in \mathfrak{m}^2$, i.e. it is
of the form $m_1 m_2$ for $m_1$ and $m_2$ in $\mathfrak{m}$. Using the identity
for tangent vectors, $v(m_1m_2)$ is given by $m_1(p)v(m_2) + v(m_1)m_2(p)$, and
both the terms are $0$, since elements of $\mathfrak{m}$ evaluate to $0$ at $p$.</p>
<p>Now we'll show any linear functional $w$ acting on
$\frac{\mathfrak{m}}{\mathfrak{m}^2}$ gives a tangent vector $t_w$. We define
the action of $t_w$ on any function $f$ as the value of the functional $w$ on
the function $\overline{f}(x) = f(x) - f(p)$. The function $\overline{f}$ vanishes at $0$, and hence it
belongs to $\mathfrak{m}$. All we need to do is check whether it satisfies the
product rule.
\begin{align<em>}
t_w(f(x)g(x)) &= w(\overline{f(x)g(x)}) \
&= w(f(x)g(x) - f(p)g(p)) \
&= w(f(x)g(x) - f(x)g(p) + f(x)g(p) - f(p)g(p)) \
&= w(f(x)(g(x) - g(p))) + g(p) w(f(x) - f(p)) \
&= w((f(x) - f(p))(g(x) - g(p)) + f(p)(g(x) - g(p))) + g(p) w(f(x) - f(p)) \
&= 0 + f(p)t_w(g) + t_w(f)g(p)
\end{align</em>}</p>
<p>This proves the duality, and gives us a link between the tangent space and the local ring.
In the case of smooth manifolds, this tells us that the cotangent space defined using the local
ring is really the same as the cotangent space defined in the usual differential geometric way.
What isn't immediately clear is how the cotangent bundle is defined, and this is something I'll
come back to later. The advantage of this construction is that the same construction goes through
for algebraic varieties. Whether this is a useful notion or not in the case of algebraic varieties
is a question that needs to be answered. But before that, we should do a sanity check. In the case
of smooth manifolds, we had the dimension of the cotangent space at every point to be the dimension
of the manifold. It's reasonable to expect that this also happens in the case of algebraic manifolds.
And that is indeed the case, and we'll see a simple proof of the fact.</p>
<h3>Dimension of cotangent space is the same as the dimension of algebraic variety</h3>
<p>Before we prove the result, let's qualify the statement a little more. First of
all, it suffices to prove the result for affine varieties, since both the
cotangent space and the dimension of the variety are essentially local
properties. Secondly, varieties can have bad points, i.e. singularities, where
they intersect themselves, or there's a sharp bend of some sort (e.g. the
variety in $\mathbb{A}^2_{\mathbb{C}}$ defined by $X^3 - Y^2$). We want to avoid
those points. Thankfully, there's a nice algebraic description of the
non-singular points. A point $p$ is non-singular if the local ring
$\mathcal{O}_p$ at $p$ is regular, i.e. the maximal ideal is generated by $d$
elements, where $d$ is the dimension of the variety. With this algebraic description in hand,
our task now reduces to proving the following proposition.</p>
<p><strong>Proposition.</strong> Suppose $A$ is a Noetherian $k$-algebra, which is also a local ring whose
maximal ideal $\mathfrak{m}$ is generated by $d$ elements ${f_1, \ldots, f_d}$, where $d$
is the Krull dimension of the $k$-algebra. Then the dimension of the $k$-vector space
$\frac{\mathfrak{m}}{\mathfrak{m}^2}$ is also $d$.</p>
<p><strong>Proof.</strong> Consider any element $g$ in the ideal $\mathfrak{m}$. Since it's Noetherian and local, we can
write $g$ in the following manner,
\begin{align<em>}
g = \sum_j c_j (f_1)^{p_{1j}} \cdots (f_d)^{p_{dj}}
\end{align</em>}
where $c_j$ don't belong to the ideal $\mathfrak{m}$, and for all $j$, some $p_{ij}$ some greater
than $0$. Quotienting by $\mathfrak{m}^2$, all the terms with the higher powers of $f_i$ become $0$,
and the representative in the quotient looks like the following.
\begin{align<em>}
\overline{g} = \sum_{i}^{d} c_i f_i
\end{align</em>}
With this expression, it's easy to see what the map to the space $k^d$ is. Send $\overline{g}$ to the vector
$(c_1(p), c_2(p), \ldots, c_d(p))$. But is this map well defined? What if we also have another
representative $\overline{g}' = \sum c_i'f_i$. But in that case, each of $c_i - c_i'$ must belong
to $\mathfrak{m}$, hence $c_i(p) = c_i'(p)$.</p>
<p>Let's now construct a map from $k^d$ to $\frac{\mathfrak{m}}{\mathfrak{m}^2}$. Send the vector
$(c_1, \ldots, c_d)$ to the element $\sum c_i f_i$. It's easy to see this map is well defined,
and the inverse of the previous map. This proves the result.
$\blacksquare$</p>
<p>The cotangent space manages to pass at least this rudimentary sanity check, which makes it a little easier
to believe that this is the right notion of the cotangent space on a variety.</p>Summer 2018 update2018-06-02T00:00:00-04:002018-06-02T00:00:00-04:00Sayantan Khantag:None,2018-06-02:/update-summer-2018.html<p>I'm getting lazy. I thought I would be posting more often once the summer holidays started
but May came and went with nary a post. In my defence, I was fairly busy, dealing with
the usual bureaucratic nonsense that comes with leaving your institution for good and moving
to another …</p><p>I'm getting lazy. I thought I would be posting more often once the summer holidays started
but May came and went with nary a post. In my defence, I was fairly busy, dealing with
the usual bureaucratic nonsense that comes with leaving your institution for good and moving
to another country. I also started learning algebraic geometry seriously, and will continue to
focus on it for a major part of the summer. Doing only math gets boring really quick though,
so I'll interleave the math sessions with some coding. I started learning <a href="https://www.rust-lang.org">Rust</a>,
and got sidetracked for a while by Emacs. But before I talk about Rust and Emacs, I'll quickly
outline my math plans for the summer.</p>
<h3>Math in Summer 2018</h3>
<p>Given that I'll probably be doing more differential geometry at UMichigan, I figured I'd take
a break from that in the summer. In fact, I decided to stay as far as I can from analysis as well;
in fact, anything that involves taking derivatives, or dealing with inequalities. I considered
studying combinatorics, number theory or algebraic geometry, none of which I know anything about. I was also initially
encouraged by the fact that they seem as far from analysis as possible. That, of course, was astonishing
naïveté and ignorance on my part, because <a href="https://en.wikipedia.org/wiki/Generating_function">generating functions</a>
crop up everywhere in combinatorics, and sooner or later, one would have to show one of these
things converges in some small ball in the complex plane, and then we would be back to doing analysis.
That left number theory and algebraic geometry. Analytic number theory was out of the question, so the choice
essentially boiled down to algebraic number theory or algebraic geometry. The <a href="https://en.wikipedia.org/wiki/Algebraic_number_theory">Wikipedia page</a>
for algebraic
number theory had one picture, and that was a picture of a textbook. The <a href="https://en.wikipedia.org/wiki/Algebraic_geometry">page</a> for
algebraic geometry had three, and they were all images of algebraic varieties. That essentially made my decision.</p>
<p>I started reading Fulton's <a href="www.math.lsa.umich.edu/~wfulton/CurveBook.pdf">book</a> on algebraic curves and assumed
that would be enough. However, I learnt of an algebraic geometry summer school happening in IISER Pune, and signed
up for it immediately. But that necessitated a change of pace as they require the attendees to have studied the first chapter of Hartshorne's book
in excruciating detail. Currently, I'm reading both Fulton and Hartshorne
simultaneously and I'll probably also resume doing the exercises in Atiyah-Macdonald's commutative algebra book.</p>
<p>My personal goal is to build up some sort of Rosetta stone like glossary for translating stuff from algebraic
to differential geometry and vice versa. I'll write more about it once I've made some progress. The long term
goal is to learn complex algebraic geometry, where I'll be able to use techniques from both differential and
algebraic geometry. But this will probably take some time, perhaps a year or longer.</p>
<h3>Rust and Emacs</h3>
<p>I've been hearing about Rust a lot these days, especially after the release of Firefox Quantum, which
apparently has large parts written in Rust. I thought it would be nice to pick a low level language
like Rust. It's certainly more suitable than Python for writing always running daemons, given that
each individual Python program spins up VM that takes up 20 to 30 megabytes for simple tasks like
scanning log files or fetching mail.</p>
<p>So far, I've learnt the basics of Rust's <a href="https://doc.rust-lang.org/book/second-edition/ch04-00-understanding-ownership.html">ownership model</a>,
and I seem to get the basic idea, although I'll only know that for sure if I use it build something fairly
complex. I'm racking my brains thinking about what to write: I've considered writing a compression program,
which might be fun, but I'm afraid it might be a bit too ambitious. I guess I'll figure it out sooner or later.</p>
<p>As I mentioned earlier, I got sidetracked for a while by Emacs when I was setting it up as a Rust
environment. As it often happens, instead of just using Emacs as a tool to write code in, I started
playing around with it instead. Of course, the fact that Emacs essentially is a Lisp environment
only got me more distracted. I learnt some Emacs Lisp from this really well written <a href="https://www.gnu.org/software/emacs/manual/eintr.html">book</a>
and it was rather fun. In fact, just a few days ago, I sent out a <a href="https://github.com/rakanalh/emacs-dashboard/pull/70">pull request</a>
extending an Emacs package I use. I'll be thrilled if it gets accepted. Emacs is really nice this way because
there's practically no barrier between using the editor to examining and changing the source code of the various components
that make up the editor. I cannot imagine doing the same in Vim. I'm really glad I switched.</p>
<h3>Conclusion</h3>
<p>That's it for the summer update. I'll post more (and I hope with greater frequency), hopefully stuff related to algebraic
geometry and Rust (and maybe even Emacs).</p>Construction of Chern classes2018-04-20T00:00:00-04:002018-04-20T00:00:00-04:00Sayantan Khantag:None,2018-04-20:/chern-classes.html<h2>Characteristic classes</h2>
<p>Given a manifold $M$, one way to study vector bundles over $M$ is to use the theory of
characteristic classes. A characteristic class is a way of assigning to each vector bundle over $M$
an element of the cohomology ring $H^{\ast}(M, G)$. This assignment is not …</p><h2>Characteristic classes</h2>
<p>Given a manifold $M$, one way to study vector bundles over $M$ is to use the theory of
characteristic classes. A characteristic class is a way of assigning to each vector bundle over $M$
an element of the cohomology ring $H^{\ast}(M, G)$. This assignment is not just any arbitrary
assignment; it has to satisfy a <em>naturality</em> condition. If $(F, N, \pi_2)$ is a vector
bundle, $f: M \to N$ is a smooth map, (E, M, \pi_1) is the <a href="https://en.wikipedia.org/wiki/Pullback_bundle">pullback bundle</a>,
and $c(E)$ and $c(F)$ are the cohomology classes assigned to $E$ and $F$ by the characteristic class
$c$, then the pullback of $c(F)$ along the map $f$ is $c(E)$.
\begin{align<em>}
c(E) = f^{\ast}(c(F))
\end{align</em>}
The naturality condition is what makes characteristic classes so useful: it tells us that
characteristic classes are an invariant of vector bundles over a manifold. If two vector
bundles are isomorphic, they'll get assigned the same characteristic class.</p>
<p>The first step in studying characteristic classes is to construct interesting examples of them.
Its definition is certainly not of much help in actually constructing any examples. In fact,
constructing characteristic classes is not entirely trivial, and requires some work. We'll look
at one way of constructing characteristic classes, called Chern classes. This construction
is quite non-trivial, and requires the use of differential geometric machinery. The advantage of this
construction however, is that it gives an explicit way of constructing characteristic classes over
a lot of familiar vector bundles.</p>
<h2>Connections on vector bundles</h2>
<p>Given a rank $k$ vector bundle $(E, M)$, a connection on the vector bundle is a bundle map $\nabla$ from
$E$ to $E \otimes T^{\ast}M$ which satisfies the following condition for all smooth real-valued functions $f$.
\begin{align<em>}
\nabla(fv) = v \otimes df + f (\nabla v)
\end{align</em>}
If we pick local coordinates and a trivialization around some point in $M$, then the connection
$\nabla$ can be described by a $k \times k$ matrix of $1$-forms, which we'll denote
$A$, where the $i$<sup>th</sup> column
denotes what $\nabla e_i$ goes to, where ${e_i}$ is the local frame for the vector bundle.
The matrix $A$ is often called a <a href="https://en.wikipedia.org/wiki/Connection_form">connection form</a>.
Using the matrix $A$, we can construct another matrix, this one consisting of $2$-forms, which we'll call
the curvature form<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>.
\begin{align<em>}
\Omega = dA + A \wedge A
\end{align</em>}
The curvature form will be the key tool we'll use to construct the Chern classes.</p>
<h2>Constructing globally defined forms using curvature</h2>
<p>The nice thing about the curvature form is that it transforms in a particularly
nice manner when one changes the trivialization for the vector bundle. If $\Omega$
is the curvature matrix in the old trivialization, and the new trivialization is given
by multiplying by an invertible matrix $g$, then in the new trivialization, the curvature
matrix is given by $g\Omega g^{-1}$.</p>
<p>Consider the trace of the curvature matrix defined on some open set. The trace will be
a locally defined $2$-form. Now recall that for any matrix $M$, $\mathrm{tr}(M) = \mathrm{tr}(gMg^{-1})$.
This means that we can defined a "trace" of the curvature form globally (using a partition of unity argument),
and this gives us a globally defined $2$-form. In fact, this can be done for any homogeneous
polynomial in the entries of a matrix
which is conjugation invariant, e.g. the determinant. Doing this for the determinant will give
us a globally defined $2k$-form. We now have a way of constructing globally defined forms
using the curvature form.</p>
<p>Let's take a pause here to recall our goal. We want to associate to each vector bundle over $M$
an element of $H^{\ast}(M)$. What we have done so far is to associate to each vector bundle $E$
and a choice of connection $\nabla$ on that bundle a collection of differential forms. If we manage to show
that the differential form is actually closed, we'll have an assignment of $(E, \nabla)$ to an element
of $H^{\ast}_{DR}(M)$. Furthermore, if we show that the assignment to the cohomology class
is independent of which connection we pick, we'll have constructed a characteristic class. In the following
sections, we'll do both of the mentioned things, i.e. show that the globally defined forms are closed,
and their cohomology class is independent of the connection chosen.</p>
<h3>The globally defined forms are closed</h3>
<p>To show that the globally defined forms are closed, we'll need a systematic way of taking
their exterior derivative. The first step in that direction would be to simplify the expression
for the connection form $A$ by picking a nice trivialization. In fact, we can pick a trivialization
in which the connection matrix $A$ is $0$ at a point. The proof of this fact is outlined in these
<a href="http://www.math.iisc.ac.in/~vamsipingali/6Feb2018.pdf">notes</a> (and some formulae used are derived in these
<a href="http://www.math.iisc.ac.in/~vamsipingali/1Feb2018.pdf">notes</a>). We'll call this trivialization a <em>normal trivialization</em>.</p>
<p>This result already tells us that the $2$-form defined by taking the trace of the curvature is
closed. Pick a normal trivialization around $p$; this makes $A$ vanish at $p$. Then at that point, the curvature
matrix is just $dA$. The exterior derivative of the trace of this is the same as the trace of $d^2A$
(because the trace is a homogeneous degree $1$ polynomial in the entries of the matrix),
which is $0$.</p>
<p>To extend this idea to homogeneous polynomials of higher degrees, we need some way "linearizing" them,
so that we can take their exterior derivative easily. We do this by <em>polarizing</em> them. The polarization
of a degree $j$ homogeneous polynomial $f$ in the entries of the matrix is a function $\phi$ that takes $j$
matrix arguments, and outputs an element of the algebra over which the matrices are defined. The function $\phi$
is linear in each of its arguments, is symmetric in the order of its arguments, and is conjugation invariant, i.e.
it satisfies the following identity for all invertible matrices $g$.
\begin{align<em>}
\phi(A_1, \ldots , A_j) = \phi(gA_1g^{-1}, \ldots , gA_jg^{-1})
\end{align</em>}
Furthermore, $\phi$ must also satisfy the following polarization identity for all matrices $A$.
\begin{align<em>}
\phi(A, \ldots , A) = f(A)
\end{align</em>}
This last equality is why $\phi$ is called the polarization of $f$.
Constructing the polarization of a homogeneous degree $j$ polynomial isn't too hard.
The first step is to just construct a multilinear function that satisfies the polarization
identity. One does that by taking each monomial in the expression for $f$, and replacing
the $i$<sup>th</sup> factor by the corresponding factor in the $i$<sup>th</sup> argument.</p>
<p>This is best illustrated by a simple example. Suppose we are working with $2 \times 2$ matrices,
and the polynomial we want to polarize is the determinant polynomial. Its expression is given
in the following manner.
\begin{align<em>}
f({a_{ij}}) = a_{11}a_{22} - a_{12}a_{21}
\end{align</em>}
Its polarization $\phi$ must have two arguments. We'll denote the entries of the first
argument by superscript $1$ and the second by superscript $2$. Then the first step would
be two write the following expression.
\begin{align<em>}
a^1_{11}a^2_{22} - a^1_{12}a^2_{21}
\end{align</em>}</p>
<p>Whatever we get in the previous step certainly is multilinear in the arguments and satisfies the
polarization identity. The next step is to symmetrize it. That can be easily done by taking
all permutations of the arguments and taking an average over them. This continues to satisfy
the polarization identity and is multilinear and symmetric. The only thing to do now is to make
$\phi$ conjugation invariant. But as it turns out, $\phi$ is already conjugation invariant,
because $f$ is conjugation invariant. The proof of this fact is a little tricky,
and is outlined in the next few paragraphs.</p>
<p><strong>Lemma.</strong> If $f$ is a conjugation invariant degree $j$ polynomial in the entries
of a matrix, and $\phi$ is its symmetric and multilinear polarization, then $\phi$
is also conjugation invariant (which we'll also call basis invariant).</p>
<p><strong>Proof:</strong> We'll prove this lemma by inducting on the number of distinct matrices in the arguments of
$\phi$. Observe that $\phi$ can have at most $j$ distinct matrices as arguments. What we'll show
is that the following equality holds, when $\left{ A_1, A_2, \ldots, A_j\right}$ contain at
most $m$ distinct matrices for each $1 \leq m \leq j$.
\begin{align<em>}
\phi\left(gA_1g^{-1}, gA_2g^{-1}, \ldots, gA_jg^{-1}\right) = \phi\left(A_1, A_2, \ldots, A_j\right)
\end{align</em>}
Let's start with the base case of $m=1$. This follows from the hypothesis that $f$ is basis
invariant, since $\phi$ with all identical arguments is just the function $f$. Now suppose we
have shown the result for some $m < j$. We now need to show $\phi$ is basis invariant when
supplied with at most $m+1$ different arguments. Pick any set of $m+1$ matrices
$\left{A_1, A_2, \ldots, A_{m+1}\right}$, and any invertible matrix $g$. We want to show the
following equality (with some of arguments repeated possibly, if $m+1 < j$).
\begin{align<em>}
\phi\left(gA_1g^{-1}, gA_2g^{-1}, \ldots, gA_{m+1}g^{-1}\right) = \phi\left(A_1, A_2, \ldots, A_{m+1}\right)
\end{align</em>}
Consider the following expression, for $t \in \mathbb{C}$.
\begin{align}
\phi\left( A_1 + tA_{m+1}, A_1 + tA_{m+1}, \ldots, A_1 + tA_{m+1}, A_2, \ldots, A_m \right) \label{eq:1}
\end{align}
Here, $A_1 + tA_{m+1}$ is repeated $j -m$ times. Since expression $\ref{eq:1}$ has at most $m$
distinct arguments, we can use the induction hypothesis to conclude that expression \ref{eq:1}
would be the same if we conjugated all arguments with $g$. In fact, expression \ref{eq:1} can be
expanded out to be written as a univariate polynomial $P$ in $t$, with coefficients in $R$. The
coefficient $c_j(P)$ of $t^j$ in the polynomial is the following.
\begin{align<em>}
c_j(P) = \binom{j-m}{j} \phi\left( A_1, \ldots, A_1, A_{m+1}, \ldots, A_{m+1}, A_2, \ldots, A_m\right)
\end{align</em>}
Here $A_1$ is repeated $j-m - j$ times, and $A_{m+1}$ repeated $j$ times. Similarly, consider
the polynomial $P'$ one gets by expanding out the conjugated version of expression
\ref{eq:1}. The coefficient $c_j(P')$ of $t^j$ in $P'$ is given by a similar expression.
\begin{align<em>}
c_j(P') = \binom{j-m}{j}
\phi\left( gA_1g^{-1}, \ldots, gA_1g^{-1}
, gA_{m+1}g^{-1}, \ldots, gA_{m+1}g^{-1}, gA_2g^{-1}, \ldots, gA_mg^{-1}\right)
\end{align</em>}
Recall again that by the induction hypothesis, the polynomials $P$ and $P'$ are the same. Which
means their coefficients must also be the same. But the coefficients being equal means that even
if $\phi$ has $m+1$ different entries, it's still conjugation invariant, hence proving the
induction step, and the lemma. $\blacksquare$</p>
<p>The upshot of proving that every homogeneous polynomial can be polarized is that now we
have an easy way of taking exterior derivative. If $f$ is the degree $j$ homogeneous polynomial, and $\phi$ its
polarization, then the exterior derivative of $f(\Omega)$ is given by the following expression.
\begin{align<em>}
df(\Omega) = j \cdot \phi(d\Omega, \Omega, \Omega, \ldots , \Omega)
\end{align</em>}
Using normal coordinates, and exploiting the multilinearity of $\phi$, we see that every global
$2$-form obtained from the curvature form by applying the polynomial $f$ is actually closed,
and hence an element of the cohomology ring. We thus have a way associating a vector bundle
and a connection on it to an element in the cohomology ring. The next step is to show
this assignment is independent of the connection chosen.</p>
<h3>Independence from choice of connection</h3>
<p>Given a homogeneous polynomial $f$, and two connections $\nabla_0$ and $\nabla_1$ on the vector
bundle, we need to show the associated forms $f(\Omega_0)$ and $f(\Omega_1)$ differ by an exact
form. Consider the form $\eta_t = f((1-t)\Omega_0 + (t)\Omega_1)$. This defines a path in the
space of forms between f(\Omega_0) and $f(\Omega_1)$. If we show $\frac{d}{dt} \eta_t$ is exact,
that will show what we wanted to prove. In fact, the exterior derivative of the following
form is precisely $\frac{d}{dt}\eta_t$.
\begin{align<em>}
j \phi \left( A_1 - A_0, \Omega_t, \Omega_t, \ldots, \Omega_t \right)
\end{align</em>}
Here $A_i$ is the connection form of the connection $\nabla_i$, and $\Omega_t$ the curvature
form of the connection $\nabla_t$. If one takes the exterior derivative of this expression, using
the normal coordinates, one can see that it's the same as $\frac{d}{dt} \eta_t$. This proves that
the choice of connection doesn't matter<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>.</p>
<h2>The Chern class</h2>
<p>Now that we have seen how to construct characteristic classes using the curvature form, we'll
construct a specific example, these are the ones that are called the Chern classes. Let
$E$ be a rank $k$ vector bundle, and let $\Omega$ be the curvature of a chosen connection on $E$.
Consider the following polynomial in $t$, where $t \in \mathbb{R}$.
\begin{align<em>}
\det(t\Omega + I) = \sum_{i=1}^{k} f_i(\Omega) t^i
\end{align</em>}
Each of the $f_i(\Omega)$ are homogeneous degree $i$ polynomials in $\Omega$. Because
the left hand side is conjugation invariant, so is each of the $f_i(\Omega)$. That means
each $f_i(\Omega)$ defines an element (of degree $2i$) in the cohomology ring of the base space $M$.
The $i$<sup>th</sup> Chern class of the vector bundle $E$ is defined to be $f_i(\Omega)$. The previous part
shows that this is well defined independent of the connection chosen.</p>
<p>The Chern classes satisfy several nice properties, including a product formula
for the Whitney sum of two vector bundles, naturality, etc. The Wikipedia <a href="https://en.wikipedia.org/wiki/Chern_class#Construction_of_Chern_classes">page</a>
provides a good description, as well as references for further reading.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>The reason why this is called the curvature form is that in the case of the Levi-Civita
connection on a Riemannian manifold, this definition reduces to the standard Riemann curvature
endomorphism. <a href="https://en.wikipedia.org/wiki/Curvature_form#Curvature_form_in_a_vector_bundle">See this</a>. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>This proof of independence from the connection was taken from the problem set <a href="http://math.iisc.ac.in/~vamsipingali/HW43392018.pdf">here</a>. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>A week at Berlin Mathematical School2018-02-24T00:00:00-05:002018-02-24T00:00:00-05:00Sayantan Khantag:None,2018-02-24:/bms-week.html<p>I spent the last week (18th to 24th February) at Berlin, courtesy Berlin Mathematical School,
who invited me over for the BMS Days (where I had an interview for a PhD position), as well
as the BMS Student Conference which immediately followed the BMS Days. I heard a lot of …</p><p>I spent the last week (18th to 24th February) at Berlin, courtesy Berlin Mathematical School,
who invited me over for the BMS Days (where I had an interview for a PhD position), as well
as the BMS Student Conference which immediately followed the BMS Days. I heard a lot of talks
on very interesting stuff, some of which I want to outline here, just to have an account of it,
if nothing else; this will also serve as a reminder of the areas and results I might want to follow
up on some time in the future.</p>
<h2>The math talks (in no particular order)<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup></h2>
<p>I'll only write about some of the talk; although all of them were fairly interesting,
some were more interesting than others. </p>
<h3>The computational complexity of query answering under updates (Nicole Schweikardt, HU Berlin)</h3>
<p>This talk outlined the idea of analyzing database systems from the point of view
of computational complexity theory, that is to say, prove effective lower bounds on
the time complexity of querying a database (and to be more specific, time complexity
in terms of only the database size for a fixed query, as well as the in terms of the
complexity of the query and the database size). The query language itself was modelled as
a first order language with a fixed number of atomic predicates depending on the type
of the database (first order language here means that the atomic predicates maybe negated,
conjuncted, and disjuncted, i.e. NOTed, ANDed, and ORed, and one is also allowed to use
existential and universal qualifiers over elements of the database). This was the general
framework: the results in the presentation however looked at the subclass of queries
(the <em>Conjunctive Queries</em>) which
only used AND of atomic propositions, and only used the existential qualifiers<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>.</p>
<p>For a fixed query, Nicole described a good algorithm and a data structure for the
database such that existence and enumeration of the entries satisfying the queries could
be done reasonably fast, as well as updating the database. </p>
<h3>Orbit Closures of Homogeneous Forms (Jesko Hüttenhain, TU Berlin)</h3>
<p>This talk was motivated by the difference between the complexity class #<strong>P</strong> and the
class <strong>NP</strong>, which the speaker roughly described as the difference in difficulty between
computing the permanent and the determinant of a given matrix<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">3</a></sup>. It's however only a belief
and not really known whether the former problem is strictly harder than the latter. The speaker
proposed a possible line of attack via algebraic geometry. Consider the determinant function on
$M_n(\mathbb{C})$. It is a homogeneous polynomial of degree $n$ in $n^2$ complex variables,
and it's therefore a point in the space $\mathbb{C}[x_1, \ldots, x_{n^2}]$. We can now look
at the action of $GL(n, \mathbb{C})$ on this space defined in the following manner.
$$\sigma_A: \det(X) \mapsto \det(AX)$$
The orbit defines a subset of $\mathbb{C}[x_1, \ldots, x_{n^2}]$, and this subset essentially
corresponds to all the polynomials which are as easy to compute as the determinant. Looking
at the orbit of the permanent polynomial under the same action, we would get the set of polynomials
which are as hard to compute as the permanent is. A way to show the permanent is harder to compute than
the determinant would be to show that the two orbits are different. If one looks at the closure
of the orbits, and shows that they are different, then one would have shown that even approximating
the permanent is hard<sup id="fnref:4"><a class="footnote-ref" href="#fn:4">4</a></sup>.</p>
<p>Jesko then stated one of his results, which characterized the boundary of one such orbit
closure. He also mentioned that this approach to complexity theory might take a while to
bear significant results, as the foundations of this area are being built up.</p>
<h3>(Some aspects of) Convexity and curvature (Stephen Lynch, FU Berlin)</h3>
<p>This talk was a presentation of Stephen's recent work (which is also on the
<a href="https://arxiv.org/abs/1709.09697">arXiv</a>). This work generalized convex embeddings
of $S^n$ into $\mathbb{R}^{n+1}$ to higher co-dimension embeddings, where the notion
of convexity does not make sense. This was done by replacing the condition of convexity
by an inequality on the second fundamental form, which is exactly equivalent to convexity
in the case of co-dimension $1$. This sort of inequality, called the pinching condition,
leads to the solution of the mean curvature flow existing for all time, the solution
exhibited further rigidity in the sense that the evolution of time of the sphere is just
homothety.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>The complete list of BMS Student Conference talks is
<a href="https://bmsstudconf.github.io/2018/talks.html">here</a> and the ones given
at BMS Days are listed <a href="https://www.math-berlin.de/academics/bms-days">here</a>. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>This kind of restriction is quite reminiscent of monotone circuits, and I wondered
whether there was any link between presented results, and lower bounds on monotone
circuits: the speaker however had only chosen this subclass as a simpler problem to tackle
before tackling the problem over the full first order logic. But maybe the results obtained
for the full first order logic might not be as good as the ones of only conjunctive queries.
Perhaps I should have a look at this problem again in the future: some sort of such obstruction
might turn up, and if I were to be even more optimistic, perhaps a direct correspondence between
monotone circuits and conjunctive queries, and general circuits and general queries. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>The former problem is quite hard, as even computing the permanent of a 0-1 matrix is #<strong>P</strong>-complete,
and the latter problem is fairly easy, being in the class <strong>P</strong>. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:4">
<p>This sort of approach is a part of the nascent area of <a href="https://en.wikipedia.org/wiki/Geometric_complexity_theory">geometric complexity theory</a>. <a class="footnote-backref" href="#fnref:4" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
</ol>
</div>Setting up GitLab to automatically generate PDFs from committed LaTeX files2018-01-17T00:00:00-05:002018-01-17T00:00:00-05:00Sayantan Khantag:None,2018-01-17:/latex-gitlab-ci.html<p>I had been meaning to get started with GitLab's continuous integration
to generate PDFs of my assignments and notes, rather then generating the
PDFs offline and committing them to the repository as well, but I always
kept delaying the migration because of the lack of sufficient documentation
on the matter …</p><p>I had been meaning to get started with GitLab's continuous integration
to generate PDFs of my assignments and notes, rather then generating the
PDFs offline and committing them to the repository as well, but I always
kept delaying the migration because of the lack of sufficient documentation
on the matter. This morning I finally got around to doing it, and I thought
I'll document it for anyone who wishes to do the same in the future.</p>
<h2>Outline of GitLab's continuous integration service</h2>
<p>On receiving a commit to a repository hosted on GitLab, it
checks whether the repository has a file named <code>.gitlab-ci.yml</code>
in the root directory. This file contains the commands to be executed
by whatever computer is running the continuous integration service.
In GitLab's parlance, these are called <a href="https://docs.gitlab.com/runner/">Runners</a>.
These runners can be any computer, from a server running in your room, to a short lived
VM on the cloud. For the free tier, GitLab provides access to runners on
<a href="https://aws.amazon.com/">AWS</a>, but with the restriction of having only 2000
minutes of compute time per month.</p>
<p>For these free runners, there's no configuration to be done from our side; all we need
to do is push a <code>.gitlab-ci.yml</code> to our repository, and GitLab takes care of
running it on a runner. There is one thing to watch out for though. The free runners
are usually short lived, and one can't install software on them, which means
we can't do a <code>sudo apt install texlive-full</code> as a command that runs on the runner.
Luckily, the runners do have
<a href="https://www.docker.com/">docker</a>[<a href="http://www.zdnet.com/article/what-is-docker-and-why-is-it-so-darn-popular/">2</a>]
installed on them, which means we can use some image from which has all the
necessary software (i.e. <code>texlive-full</code>) installed on it already.</p>
<h2>Configuring the runner to compile LaTeX</h2>
<p>A cursory google search for a suitable configuration turned up the following
<a href="https://github.com/aufenthaltsraum/stuff/wiki/Using-GitLab-CI-for-Building-LaTeX">configuration</a>,
which is rather rudimentary, but is good guide for creating
our configuration.</p>
<div class="highlight"><pre><span></span><code><span class="n">compile_pdf</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="n">image</span><span class="o">:</span><span class="w"> </span><span class="n">aergus</span><span class="o">/</span><span class="n">latex</span><span class="w"></span>
<span class="w"> </span><span class="n">script</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">latexmk</span><span class="w"> </span><span class="o">-</span><span class="n">pdf</span><span class="w"> </span><span class="n">my_file</span><span class="o">.</span><span class="na">tex</span><span class="w"></span>
<span class="w"> </span><span class="n">artifacts</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="n">paths</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">my_file</span><span class="o">.</span><span class="na">pdf</span><span class="w"></span>
</code></pre></div>
<p>Let's go over this line by line. The first line describes the name of the job that
will be run. There can be several jobs described in a configuration file, and they
will usually be run asynchronously unless some job is listed as a dependency of another.
The next line describes what docker image to fetch: <code>aergus/latex</code> is Debian Testing
with <code>texlive-full</code> already installed. The next two lines describe the script that
will be run: these scripts are run from the root directory of the repository. In
this case, that means <code>latexmk -pdf</code> is being run on <code>my_file.tex</code> which is
at the root directory of the repository. It's possible to upload a shell script
or a Makefile to the repository and run that instead (I ended up doing the latter).
However, the files generated during the build process are discarded, which is not
quite what I wanted. I would like to keep the generated PDFs; the artifacts line
does exactly that. The artifacts can later be browsed or downloaded via the GitLab
web interface.</p>
<p>In my case however, the setup is a bit more complex. I do not keep all my TeX files
in the root directory, but rather organize them by course and assignment number.
So the TeX file for the fourth assignment for a topology course will have the following
location: </p>
<div class="highlight"><pre><span></span><code><span class="n">Math</span><span class="o">/</span><span class="n">MA232</span><span class="err">\</span><span class="w"> </span><span class="n">Topology</span><span class="o">/</span><span class="n">assignments</span><span class="o">/</span><span class="mh">04</span><span class="o">/</span><span class="n">assignment_04</span><span class="p">.</span><span class="n">tex</span><span class="w"></span>
</code></pre></div>
<p>What I would like is to make sure the generated PDF for this TeX file is
placed in the following location.</p>
<div class="highlight"><pre><span></span><code><span class="n">Math</span><span class="o">/</span><span class="n">MA232</span><span class="err">\</span><span class="w"> </span><span class="n">Topology</span><span class="o">/</span><span class="n">assignments</span><span class="o">/</span><span class="n">assignment_04</span><span class="p">.</span><span class="n">pdf</span><span class="w"></span>
</code></pre></div>
<p>I'd also like my thesis to be compiled on each commit; the location
of my thesis in the repository is the following.</p>
<div class="highlight"><pre><span></span><code>Math/UM400\ Undergraduate\ Project/thesis/thesis.tex
</code></pre></div>
<p>I wrote up a <code>Makefile</code> that does all the compilation work, and places the PDFs
in appropriate locations.</p>
<div class="highlight"><pre><span></span><code><span class="n">assignments</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">Math</span><span class="sr">/MA339\ Geometric\ Analysis/assignments; \</span>
<span class="sr"> latexmk -pdf */</span><span class="n">assignment_</span><span class="o">*.</span><span class="n">tex</span><span class="w"> </span>
<span class="n">thesis</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">Math</span><span class="sr">/UM400\ Undergraduate\ Project/</span><span class="n">thesis</span><span class="o">;</span><span class="w"> </span><span class="o">\</span><span class="w"></span>
<span class="w"> </span><span class="n">latexmk</span><span class="w"> </span><span class="o">-</span><span class="n">pdf</span><span class="w"> </span><span class="n">thesis</span><span class="o">.</span><span class="na">tex</span><span class="w"></span>
</code></pre></div>
<p>And the <code>.gitlab-ci.yml</code> file I finally ended up using was this.</p>
<div class="highlight"><pre><span></span><code><span class="nl">stages:</span><span class="w"></span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">build</span><span class="w"></span>
<span class="nl">compile_pdf:</span><span class="w"></span>
<span class="w"> </span><span class="nl">image:</span><span class="w"> </span><span class="n">aergus</span><span class="o">/</span><span class="n">latex</span><span class="w"></span>
<span class="w"> </span><span class="nl">script:</span><span class="w"></span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">assignments</span><span class="w"></span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">thesis</span><span class="w"></span>
<span class="w"> </span><span class="nl">stage:</span><span class="w"> </span><span class="n">build</span><span class="w"></span>
<span class="w"> </span><span class="nl">artifacts:</span><span class="w"></span>
<span class="w"> </span><span class="nl">paths:</span><span class="w"></span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="s">"Math/MA339 Geometric Analysis/assignments/assignment_*.pdf"</span><span class="w"></span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="s">"Math/UM400 Undergraduate Project/thesis/thesis.pdf"</span><span class="w"></span>
</code></pre></div>
<p>Adding these two files to the root directory of the repository does the trick.
One issue I came up against was the spaces in filenames shouldn't be escaped
with a backslash, but rather the whole file name should be enclosed in quotes.</p>
<p>The generated artifacts can be browsed by visiting the following link.</p>
<div class="highlight"><pre><span></span><code>https://gitlab.com/<username>/<repo-name>/-/jobs/artifacts/master/browse?job=compile_pdf
</code></pre></div>
<p>It seems that compiling all the files after a commit takes up four to five minutes on
the runner, the majority of the time being spent fetching the docker container.
That translates to roughly 400 compiles in a month, which is a reasonable enough
limit, if one or two people are committing files to the repository, but might
be a problem if a large group of people are committing a large number of files
to the repository.</p>
<p>The point of this whole exercise was to let me get rid of a LaTeX installation
on the devices I carry to class to make notes, which is an extremely
space constrained Nexus 7 tablet. All I now have installed on the tablet is git and
Emacs, after uninstalling texlive (also, compiling PDFs locally on the tablet would
take upwards of a minute on the under powered CPU it had).</p>Cohomology as a measure of local to global failure2017-12-25T00:00:00-05:002017-12-25T00:00:00-05:00Sayantan Khantag:None,2017-12-25:/cohomology-local-global.html<h2>Motivation for cohomology</h2>
<p>In most introductory algebraic topology courses, cohomology is rather poorly motivated. It's most
commonly seen form in an algebraic topology course is singular cohomology, which arises as a the
homology of the dual of the singular chain complex, but that doesn't really tell you why it's of …</p><h2>Motivation for cohomology</h2>
<p>In most introductory algebraic topology courses, cohomology is rather poorly motivated. It's most
commonly seen form in an algebraic topology course is singular cohomology, which arises as a the
homology of the dual of the singular chain complex, but that doesn't really tell you why it's of any
more interest than singular homology, aside from the fact that you get an additional cup product
which you did not have before. However, this approach obscures the geometric meaning of cohomology.</p>
<p>A much more intuitive introduction to cohomology turns out to be De Rham cohomology, often
encountered in introductory differential geometry courses. Loosely speaking, De Rham cohomology
measures in how many different ways can a closed form fail to be exact.</p>
<p>Another cohomology theory we'll look at is sheaf cohomology. Loosely, the cohomology of a sheaf
measures how a certain functor $\Gamma$, which we'll define later fails to be an exact functor.</p>
<p>In the case of De Rham cohomology, we'll interpret the form being closed as a local property, and it
being exact a global property, and the the case of sheaf cohomology, the exactness of the following
sequence (the exact details of which we'll see in a later section) as a local property. $$0
\rightarrow \mathcal{S}_1 \rightarrow \mathcal{S}_2\rightarrow \mathcal{S}_3 \rightarrow 0$$ The
exactness of the sequence we get by applying $\Gamma$ to the above sequence turns out to be a global
property.</p>
<p>In both the cases, the cohomology measures how the local property fails to translate to the global
one.</p>
<h2>De Rham Cohomology</h2>
<p>The De Rham cohomology of a manifold $M$ (of dimension $m$) is the homology of the following of the
following cochain sequence. $$0 \xrightarrow{d} \Lambda^0(M) \xrightarrow{d} \Lambda^1(M)
\xrightarrow{d} \cdots \xrightarrow{d} \Lambda^m(M) \xrightarrow{d} 0$$ Here, $\Lambda^k(M)$ is the
space of $k$-forms on $M$, and $d$ is the exterior derivative operator. For a $k$-form $\omega$ to
be closed, $d\omega$ must be $0$. This is a local property, in the sense that $d\omega$ evaluated at
any point $p \in M$ depends only on the value of $\omega$ on any small neighbourhood of $p$. In
fact, one can say a little more, and claim that $d\omega(p)$ depends only on the <em>germ</em> of $\omega$
at $p$. If we pick a Euclidean neighbourhood $U$ of $p$ which is homeomorphic to the open unit
ball, the Poincaré lemma tells us that there is some $(k+1)$-form $\eta$ defined on $U$ such that
$\omega = d\eta$. In other words, the closed form $\omega$ is locally exact.</p>
<p>The De Rham cohomology class of $\omega$ measures how badly does the property of local exactness
fail to translate to global exactness. We can write $\omega$ as $\gamma + d\zeta$, where $\gamma$ is
the canonical representative of the cohomology class of $\omega$ (more on this in later posts), and
$\zeta$ is a $(k-1)$-form. If we stretch the analogy a bit, we can say $\omega$ misses being
globally exact by $\gamma$ amount. This is the first example of how cohomology measures how badly a
local property fails to be global.</p>
<h2>Sheaf Cohomology</h2>
<p>Before we see what sort of local to global failure sheaf cohomology measures, we'll quickly define
sheaves and sheaf cohomology, and look at one example.</p>
<h3>Quick introduction to sheaves</h3>
<p>Given a manifold $M$ (whatever we discuss will hold in for Hausdorff spaces, and with a little more
work, can be made to work even for a larger class of spaces like spectra of rings), a sheaf
$\mathcal{S}$ of $K$-modules ($K$ is always assumed to be a commutative ring with identity) over $M$
is a topological space $\mathcal{S}$ with a surjective map $\pi: \mathcal{S} \to M$, such that the
following properties are satisfied.</p>
<ol>
<li>$\pi$ is a local homeomorphism, i.e. for any point $s \in \mathcal{S}$, there's a neighbourhood
of $s$ such that $\pi$ restricted to that neighbourhood is a homeomorphism.</li>
<li>$\pi^{-1}(x)$, which we'll denote by $\mathcal{O}_x$, is a $K$-module, for all $x \in
M$. $\mathcal{O}_x$ is called the stalk of $\mathcal{S}$ at $x$.</li>
<li>The module operations on the stalk are continuous, i.e. if we look at the stalk with the subspace
topology, the module operations of addition and scalar multiplication are continuous.</li>
</ol>
<p>Sheaves in some sense a modules parametrized by the space $M$, like vector bundles, but vector
bundles do not satisfy the first condition, unlike sheaves. The simplest example of a sheaf is the
<em>constant sheaf</em> which is just $M \times V$, where $V$ is a $K$-module with the discrete topology.</p>
<p>Another important example is the sheaf of germs of $C^{\infty}$ functions on a manifold $M$. For
each $x \in M$, a point in $\mathcal{O}_x$ is an equivalence class of functions, the equivalence
relation being that $f \sim g$ if $f$ and $g$ agree on some neighbourhood of $x$. This sheaf
deserves a post of its own, and I shall write about it in the future.</p>
<p>The last example, which will be key to our goal, is the <em>skyscraper sheaf</em>. We'll describe it by
first describing the stalk at each point, and then putting an appropriate topology on it. Fix a
point $x_0 \in M$. The stalk $\mathcal{O}<em _neq="\neq" x x_0>{x_0}$ at $x_0$ will be $K$ as a module over itself. The
stalk at every other point is the zero module. As a set, our sheaf is the following. $$\mathcal{S}
= K \sqcup \bigsqcup</em> {0}$$ The question is what topology do we put on this space. The
<a href="https://en.wikipedia.org/wiki/Non-Hausdorff_manifold#Line_with_two_origins">line with two origins</a>
provides a hint. What we do is take $|K|$ copies of the space $M$, and if $x \neq x_0$, we identify
all of those $x$'s, otherwise we do nothing. It's not too hard to check that this defines a sheaf
over $M$ (the local homeomorphism property is the hardest to check, and relies on the fact that
points are closed in Hausdorff spaces). In fact, if $M = \mathbb{R}$ and $K = \mathbb{Z}/2$, then
then skyscraper sheaf at $0$ <em>is</em> the line with two origins. The reason this is called the
skyscraper sheaf is because only the stalk at $x_0$ is tall, the stalks everywhere are flat, which
makes it look like a tall structure in an otherwise flat featureless landscape.</p>
<p>We're really interested in is a variant of a skyscraper sheaf with two skyscrapers, i.e. the stalks
at points $x_0$ and $x_1$ are $K$, and otherwise $0$. The topology on this sheaf can be defined
analogously. We'll come back to this example once we've defined sheaf cohomology.</p>
<h3>The category of sheaves of $K$-modules over a space</h3>
<p>Just like in the case of a vector bundles over a manifold $M$, where the <em>right</em> kind of map between
vector bundles is a smooth map that is a linear map on each fibre, the <em>right</em> kind of map between
two sheaves $\mathcal{S}_1$ and $\mathcal{S}_2$ on a space $M$ is a continuous map $f$ such that it
satisfies the following properties.</p>
<ol>
<li>$\pi = \pi \circ f$</li>
<li>$f$ restricted to any any stalk $\mathcal{O}_x$ is a $K$-module homomorphism.</li>
</ol>
<p>Fixing a space $M$, we get the category of sheaves of $K$-modules over $M$, whose objects are
sheaves, and the morphisms are what we just defined, called sheaf homomorphisms. It follows from the
fact that $\pi$ is a local homeomorphism that even sheaf homomorphisms are local homeomorphisms.
This category turns out to be especially nice, sharing many characteristics with the category of
abelian groups and more generally, the category of $K$-modules, such as maps possessing kernels and
cokernels, and possessing a version of the <a href="https://en.wikipedia.org/wiki/Isomorphism_theorems#First_isomorphism_theorem">First Isomorphism
Theorem</a>. This sort
of category is called an abelian category, and this category is the appropriate category to do
homological algebra in. Coming back to sheaves, the kernel of a sheaf homomorphism $f: \mathcal{S}_1
\to \mathcal{S}_2$ is the set of all points which map to the zero element in the stalk. With a
little bit of work, we can show the image of $f$ is a sheaf in its own right, and subsheaf of
$\mathcal{S}_2$, just like the kernel of $f$ is a subsheaf of $\mathcal{S}_1$ (the definition of a
subsheaf is the most obvious one).</p>
<p>With all these definitions in hand, we can talk about exact sequences of sheaves. Consider a
sequence of sheaves and sheaf homomorphisms of the following kind. $$\cdots \xrightarrow{d_{i-2}}
\mathcal{S}<em i-1>{i-1} \xrightarrow{d</em>} \mathcal{S}<em i>{i} \xrightarrow{d</em>} \mathcal{S}<em i_1="i+1">{i+1}
\xrightarrow{d</em>} \cdots$$ This sequence is exact if $\mathrm{ker}(d_{i}) =
\mathrm{im}(d_{i-1})$.</p>
<p>The next thing we look at is the functor $\Gamma$ from the category of sheaves of $K$-modules over
$M$ to the category of $K$-modules. For each sheaf $\mathcal{S}$, the object $\Gamma(\mathcal{S})$
is the module of sections of $\mathcal{S}$. A section of a sheaf $\mathcal{S}$ is a map $s: M \to
\mathcal{S}$ such that $\pi \circ s = \mathrm{id}$. Clearly, we can add two sections, and we can
also multiply them by a scalar; we therefore have a $K$-module. The functor $\Gamma$ acts on
morphisms by composing them with the section map, i.e. $\Gamma(f) = f \circ s$. The important
question to ask here is whether the functor $\Gamma$ is exact, i.e. does it short exact sequences to
short exact sequences. The answer is no. Consider the following short exact sequence. $$0
\rightarrow \mathcal{S}_1 \xrightarrow{\alpha} \mathcal{S}_2\xrightarrow{\beta} \mathcal{S}_3
\rightarrow 0 \tag{1}$$ If we apply the functor $\Gamma$ to the sequence, we get something that is
not completely exact. $$0 \rightarrow \Gamma(\mathcal{S}_1) \xrightarrow{\Gamma(\alpha)}
\Gamma(\mathcal{S}_2) \xrightarrow{\Gamma(\beta)} \Gamma(\mathcal{S}_3) \rightarrow 0 \tag{2}$$ This
sequence is exact only exact at $\Gamma(\mathcal{S}_1)$ and $\Gamma(\mathcal{S}_2)$.</p>
<p>Suppose some $s \in \Gamma(\mathcal{S}_1)$ maps to $0$ in $\Gamma(\mathcal{S}_2)$. That tells us
that $\Gamma(\alpha)(s) = 0$. But that by definition means that $\alpha \circ s = 0$. But $\alpha$
is injective, which means $s = 0$. This shows exactness at $\Gamma(\mathcal{S}_1)$.</p>
<p>Showing exactness at $\Gamma(\mathcal{S}_2)$ is a little more involved. Consider an element $s \in
\Gamma(\mathcal{S}_2)$ which gets mapped to the zero section in $\Gamma(\mathcal{S}_3)$. That means
for all $m \in M$, $\beta(s(m)) = 0$. By exactness at $\mathcal{S}_2$, we can find for each $m$, an
element $s'(m)$ of $\mathcal{S}_1$ such that $\alpha(s'(m)) = s(m)$. Furthermore, because the
original short exact sequence is exact at $\mathcal{S}_1$, the element $s'(m)$ is uniquely defined
(this is where the argument fails to work for $\Gamma(\mathcal{S}_3)$). All we need to show now is
that the map $m \mapsto s'(m)$ is a continuous map. This is where we use the fact that sheaf
homomorphisms are local homeomorphisms. For any $m$, pick a small enough neighbourhood $U$ around
$s'(m)$ such that $\alpha$ is a local homeomorphism on $U$. Then $s'^{-1}(U)$ is given by
$s^{-1}(\alpha(U))$, which is open since $s$ is a continuous section.</p>
<p>Notice that the exactness of sequence $(1)$ is a purely local property; it suffices to check whether
the sequence on each stalk is exact. On the other hand, showing exactness at $\Gamma(\mathcal{S}_3)$
would be a global property. This is because given any section $s \in \Gamma(\mathcal{S}_3)$, the
best we can do is construct sections $s_U$ on open subsets $U$ of $M$. It might so happen that these
sections defined on different subsets of $M$ cannot be patched together consistently to get a
continuous section. The cohomology of the sheaf will measure how badly the functor $\Gamma$ fails to
be exact; to be more precise, the cohomology will tell us how extend sequence $(2)$ to get an exact
sequence. We'll leave the precise details of this for a later post, and satisfy ourselves with an
example of when exactness fails to happen at $\Gamma(\mathcal{S}_3)$.</p>
<p>To show this, we will exhibit a surjective sheaf homomorphism $f$ such that $\Gamma(f)$ is not a
surjective module map. Consider a connected space $M$, and let $\mathcal{S}_1$ be the constant sheaf
on $M$. Recall that this means $\mathcal{S}_1$ is $M \times K$, with the discrete topology on
$K$. Let $\mathcal{S}_2$ be the skyscraper sheaf on $M$ with two skyscrapers, which means the stalk
is $K$ at points $x_0$ and $x_1$ and zero otherwise. On the stalk at point which is not $x_0$ or
$x_1$, the homomorphism is obviously the zero homomorphism. On the stalk at $x_0$ and $x_1$, we let
the homomorphism be the identity homomorphism. It's clear that this sheaf homomorphism, call it $f$
is surjective. But observe that $\Gamma(\mathcal{S}_1) = K$. That's because we picked $M$ to be a
connected manifold, which means the section must the constant section. On the other hand,
$\Gamma(\mathcal{S}_2) = K \oplus K$, since the section can take any value independently at $x_0$
and $x_1$. Which means $\Gamma(f)$ is a map from $K$ to $K \oplus K$, which cannot be surjective in
general.</p>
<p>This tells us that exactness at $\Gamma(\mathcal{S}_3)$ is a global property, and the cohomology
measures (in a loose sense) how the local property of exactness of $(1)$ fails to translate to
exactness of $(2)$.</p>
<p>ADDENDUM: I will add links to similar expositions whenever I find them.</p>
<ol>
<li>Čech cohomology and the Mittag-Leffler problem: The Čech cohomology determines
whether meromorphic functions defined on small open sets can be patched together to
get a globally defined meromorphic function satisfying certain properties.
(<a href="https://toperkin.mysite.syr.edu/talks/sheaves_and_more_cohomology.pdf">Link</a> to article)</li>
</ol>