journal.pbio.0020424(2).pdf

(816 KB) Pobierz
Open access, freely available online
PLoS
BIOLOGY
Algorithmic Self-Assembly
of DNA Sierpinski Triangles
Paul W. K. Rothemund
1,2
, Nick Papadakis
2
, Erik Winfree
1,2
*
1
Computation and Neural Systems, California Institute of Technology, Pasadena, California, United States of America,
2
Computer Science, California Institute of
Technology, Pasadena, California, United States of America
Algorithms and information, fundamental to technological and biological organization, are also an essential aspect of
many elementary physical phenomena, such as molecular self-assembly. Here we report the molecular realization,
using two-dimensional self-assembly of DNA tiles, of a cellular automaton whose update rule computes the binary
function XOR and thus fabricates a fractal pattern—a Sierpinski triangle—as it grows. To achieve this, abstract tiles
were translated into DNA tiles based on double-crossover motifs. Serving as input for the computation, long single-
stranded DNA molecules were used to nucleate growth of tiles into algorithmic crystals. For both of two independent
molecular realizations, atomic force microscopy revealed recognizable Sierpinski triangles containing 100–200 correct
tiles. Error rates during assembly appear to range from 1% to 10%. Although imperfect, the growth of Sierpinski
triangles demonstrates all the necessary mechanisms for the molecular implementation of arbitrary cellular automata.
This shows that engineered DNA self-assembly can be treated as a Turing-universal biomolecular system, capable of
implementing any desired algorithm for computation or construction tasks.
Citation: Rothemund PWK, Papadakis N, Winfree E (2004) Algorithmic self-assembly of DNA Sierpinski triangles. PLoS Biol 2(12): e424.
Introduction
How is complex organization produced and maintained by
physical processes? One may look to biology, where we nd
the most sophisticated organization of matter, often spanning
more than 24 orders of magnitude from component
molecules (0.1 attograms) to entire organism (100 kilograms).
This organization is information-based: DNA sequences
rened by evolution encode both the components and the
processes that guide their development into an organism—
the developmental program. For a language to describe this
carefully orchestrated organization, it is tempting to turn to
computer science, where the concepts of programming
languages, data structures, and algorithms are used to specify
complex organization of information and behavior. Indeed,
the importance of universal computation for autonomous
fabrication tasks was recognized in von Neumann’s seminal
work on self-reproducing automata, where he postulated a
universal constructor
that, by reading an input tape specifying
an algorithm for what to build, could carry out the commands
necessary to construct an arbitrary object (von Neumann
1966). If algorithmic concepts can be successfully adapted to
the molecular context, the algorithm would join energy and
entropy as essential concepts for understanding how physical
processes create order. Unfortunately, the study of molecular
algorithms has been hampered by the lack of suitable physical
systems on which to hone such a theory: nature provides us
with elementary chemical reactions too simple to program,
full-blown life too complex to use as a model system, and few
systems in between. This gap may be explored by synthesizing
programmable biochemical systems in vitro, where we can
implement and study a variety of molecular algorithms
ranging gradually from simple to complex.
Biomolecular self-assembly is particularly attractive for the
exploration of molecular algorithms that control nanofabri-
cation tasks. Attesting to its power, self-assembly is used
pervasively in biology to create such structures as virus capsids,
PLoS Biology | www.plosbiology.org
2041
microtubules, and agella. In each case, the binding inter-
actions between a small number of protein species is sufcient
to dictate the form of the nal structure, often via a complex
sequence of cooperative assembly steps. This can be viewed
loosely as a form of programmable nanofabrication, where the
program is the set of molecular species involved. For synthetic
approaches, Seeman (1982, 2003) has demonstrated that DNA
provides an alternative to protein that can be readily
programmed by Watson–Crick complementarity. A seminal
paper by Adleman (1994) used one-dimensional (1D) DNA self-
assembly to operate as a nite-state machine, establishing the
rst experimental connection between DNA self-assembly and
computation. This work inspired a theoretical proposal
(Winfree 1996) that builds on Wang’s (1961, 1962) embedding
of computation in geometrical tilings to show that two-
dimensional (2D) self-assembly of DNA can perform Turing-
universal computation—which implies that
any
algorithm can
in principle be embedded in, and guide, a potentially
aperiodic crystallization process. In this
‘‘algorithmic
self-
assembly’’ paradigm, a set of molecular Wang tiles is viewed as
the program for a particular computation or molecular
fabrication task (Reif 1999; Rothemund and Winfree 2000;
Received September 14, 2004; Accepted October 5, 2004; Published December
7, 2004
DOI: 10.1371/journal.pbio.0020424
Copyright:
Ó
2004 Rothemund et al.This is an open-access article distributed
under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original work is properly cited.
Abbreviations: AFM, atomic force microscopy; aTAM, abstract Tile Assembly Model;
1D, one-dimensional; 2D, two-dimensional; DNA, deoxyribonucleic acid; DAE-E,
double-crossover, antiparallel, even intramolecular spacing, even intermolecular
spacing; DAO-E, double-crossover, antiparallel, odd intramolecular spacing, even
intermolecular spacing; kTAM, kinetic Tile Assembly Model; PCR, polymerase chain
reaction; RAM, random-access memory; XOR, exclusive or
Academic Editor: Anne Condon, University of British Columbia
*To whom correspondence should be addressed. E-mail:winfree@caltech.edu
December 2004 | Volume 2 | Issue 12 | e424
Algorithmic Self-Assembly of DNA
Adleman et al. 2001). (This framework differs from previous
approaches relating tiling theory to crystalline ground-states
[Radin 1985] in that kinetic phenomena are essential here.)
Whereas 1D algorithmic self-assembly offers limited computa-
tional power (Winfree et al. 1998b) and has been experimen-
tally demonstrated (Adleman 1994; Mao et al. 2000), 2D
algorithmic self-assembly offers not only new capabilities for
computation and construction, but also presents a new range
of physical phenomena and experimental challenges as well.
A natural Turing-universal model of computation that can
be implemented by 2D algorithmic self-assembly is the class
of 1D cellular automata. A simple but interesting choice for
the local cellular automaton rule is the exclusive–or (XOR)
function: at each time step, each cell is computed as the XOR
of its two neighbors. Beginning with a row of all ‘0’s
punctuated by a single central ‘1,’ snapshots of the cellular
automaton’s state at successive time steps may be stacked one
on top of the other to produce a space–time history identical
to Pascal’s triangle (Bondarenko 1993) modulo 2 (Figure 1A,
left), which is a discrete form of Sierpinski’s fractal triangle
(Figure 1A, right). To represent this cellular automaton as a
tiling, each local context present in the space–time history
must have a corresponding Wang tile whose shape represents
the input and output occurring at that location (Figure 1B).
Thus, we need four tiles, one for each entry of the truth table
for XOR, and a linear input row representing the initial state
of the cellular automaton (Figure 1C). Given these tiles and
the input row there is a unique way to tile the upper half-
plane without mismatches or missing tiles—the Sierpinski
Tiling—which reproduces the cellular automaton’s space–
time history (Figure 1D).
Whereas execution of a cellular automaton occurs perfectly
and synchronously, molecular self-assembly is asynchronous
and may have many types of errors. To be successful, an
implementation of cellular automata by molecular tiling must
address four challenges: (1) The abstract tiles must be
translated into molecules (molecular tiles) that readily form
2D crystals. (2) Molecular tiles must be programmed with
specic binding domains that match the logic of the chosen
abstract tiles. (3) The binding of molecular tiles must be
sufciently cooperative to enforce the correct order of
assembly and prevent errors. (4) Assembly of molecular tiles
must occur on a specied nucleating structure, and spurious
nucleation must be suppressed. These properties are neces-
sary and sufcient for implementing not only the XOR
cellular automaton, but also any other 1D cellular automaton.
All four have been shown individually: several types of DNA
Wang tiles have been designed and shown to grow into
micron-scale 2D periodic crystals (Winfree et al. 1998a; Mao
et al. 1999; LaBean et al. 2000b); the interactions between
these tiles can be programmed by sequence-specic hybrid-
ization (Winfree et al. 1998a; Mao et al. 2000); cooperative
binding of multiple domains ensures specicity—the right
tile attaches in the right place (Winfree et al. 1998b; Mao et al.
2000); and input to the self-assembly process can be provided
by a single-stranded template (LaBean et al. 2000a; Yan et al.
2003a). Here we demonstrate, via self-assembly of Sierpinski
triangles, that all four challenges can be simultaneously
overcome, thus establishing all the mechanisms necessary to
implement arbitrary cellular automata. The Sierpinski tiling,
then, gives rise to a new type of aperiodic crystal—an
algorithmic crystal.
PLoS Biology | www.plosbiology.org
2042
Results
Modeling Tile-Based Self-Assembly
Preventing the types of errors mentioned above may seem
impossible. For example, if a single binding domain is strong
enough to hold a tile in place (red arrows in Figure 1C), then
one would expect roughly 33% of tiles to mismatch with tiles
in the layer below. Simulations of self-assembly shed light on
how to avoid such dire circumstances. We use two levels of
abstraction that isolate and address critical issues for the
design and analysis of our algorithmic self-assembly experi-
ments. How crystal morphology and patterning can be
programmed by tile design in an inherently asynchronous
assembly process is addressed by the abstract Tile Assembly
Model (aTAM) (Winfree 1996, 1998a). To explore how
physical parameters, such as tile concentration and temper-
ature, affect crystal growth and inuence error rates, we use
the kinetic Tile Assembly Model (kTAM) based on reversible
tile association and dissociation rates (Winfree 1998a).
Control over the order of assembly is obtained by
exploiting the cooperativity of binding. The aTAM models
cooperativity via a threshold,
s,
representing the number of
bonds that must be made for an association event to be
thermodynamically stable: a tile may be added to a crystal if
at least
s
binding domains match the existing crystal. Black
arrows in Figure 1C indicate four potential association events
that could occur at
s
¼
2; red arrows indicate two additional
association events that would be permitted at
s
¼
1, but not at
s
¼
2. Simulation of cellular automata is designed to work at
s
¼
2. Isolated tiles cannot associate at
s
¼
2 and so growth and
computation must begin with a preformed nucleating
structure (Figure 1C, blue) that represents the input to the
computation. Importantly, at
s
¼
2 no tile may be added until
both
preceding tiles are already present, guaranteeing a
deterministic outcome despite the asynchronous order of
events. Thus, assembly from an input row containing a
solitary ‘1’ domain produces the Sierpinski triangle pattern
(Figures 1D and 2A) regardless of the order in which
permitted associations occur. If a small number of additional
s
¼
1 associations are permitted to occur, then mismatches
between neighboring tiles (mismatch errors) may result. In
this case, or if there are several ‘1’s in the input row, the
resulting pattern can appear to be qualitatively different:
owing to propagation of information and the linearity of
XOR, the resulting pattern is the superposition of Sierpinski
triangles initiated at input ‘1’s and at mismatch error sites
(see Figure 1E).
The rate at which such errors occur can be understood
using the kTAM. In this model, all tiles (regardless of how well
they match) may associate at a given site at a rate,
r
f
,
proportional to their concentration:
r
f
¼
k½tile
¼
ke
ÀG
mc
,
where
k
is a forward rate constant and
G
mc
.
0 is the
nondimensionalized entropy lost due to association—thus it
represents the monomer concentration. Dissociation rates
depend on how many binding domains match correctly: a tile
with
b
correctly matching binding domains has a dissociation
rate given by
r
r;b
¼
ke
ÀbG
se
, where
G
se
.
0 is the non-
dimensionalized free energy for a single binding domain—
thus it represents the sticky end strength.
G
se
decreases with
increased temperature. Thus, if
G
se
,
G
mc
,
2G
se
, a reaction
wherein the tile matches at a single domain would have
r
f
,
r
r,1
and thus would be thermodynamically unfavorable,
December 2004 | Volume 2 | Issue 12 | e424
Algorithmic Self-Assembly of DNA
Figure 1.
The XOR Cellular Automaton and
Its Implementation by Tile-Based Self-
Assembly
(A) Left: three time steps of its execution
drawn as a space–time history. Cells
update synchronously according to
XOR by the equation shown. Cells at
even time steps are interleaved with
those at odd time steps; arrows show
propagation of information. Right: the
Sierpinski triangle.
(B) Translating the space–time history
into a tiling. For each possible input
pair, we generate a tile T-xy that bears
the inputs represented as shapes on the
lower half of each side and the output as
shapes duplicated on the top half of each
side.
(C) The four Sierpinski rule tiles, T-00,
T-11, T-01, and T-10, represent the four
entries of the truth table for XOR:
0
È
0
¼
0, 1
È
1
¼
0, 0
È
1
¼
1, a n d
1
È
0
¼
1. Lower binding domains on
the sides of tiles match input from the
layer below; upper binding domains
provide output to both neighbors on
the layer above. Semicircular domains
represent ‘0’ and rectangular domains,
‘1’. Tiles that output ‘0’ (T-00 and T-11)
are gray, and we refer to them as ‘0’ tiles.
Tiles that output ‘1’ (T-01 and T-10) are
white and are referred to as ‘1’ tiles.
Initial conditions for the computation
are provided by a nucleating structure
(blue). Red asterisks indicate sites on the
nucleating structure that bear a ‘1’
binding domain; elsewhere, sites have
all ‘0’ binding domains. Black arrows
indicate associations that would form
two bonds; red arrows, associations that
would form one bond.
(D) Error-free growth results in the
Sierpinski pattern.
(E) Error-prone growth from a nucleat-
ing structure with three ‘1’ domains. Red
crosses indicate four mismatch errors.
DOI: 10.1371/journal.pbio.0020424.g001
while a reaction wherein the tile matches at two domains
would have
r
f
.
r
r,2
and thus would be thermodynamically
favorable. This model is a reasonable rst-order approxima-
tion for the tile-based self-assembly of single crystals. For
G
mc
2G
se
, as
G
mc
and
G
se
become arbitrarily large, the
s
¼
2 aTAM
is approximated arbitrarily well, and error rates go to zero—
concomitantly, assembly speed goes to zero (Winfree 1998a).
For ranges of
G
mc
and
G
se
compatible with current exper-
imental conditions, assuming thermodynamic and kinetic
parameters extrapolated from the literature of DNA duplex
hybridization (Bloomeld et al. 2000), this model (Figure S1)
predicts mismatch error rates between 0.1% and 1.0%
(Figure 2B).
The effects of non-idealities can also be explored in this
model. For example, Figure 2C shows growth when the
concentrations of the T-00 and T-11 tiles are twice that of the
T-01 and T-10 tiles, and the nucleating structure grows slowly
from special nucleating tiles rather than being preformed.
Under this condition there is a preferential association of ‘0’s
on the facets of the growing crystal, causing characteristic
errors that terminate Sierpinski triangles at corners and
result in large all-zero patches (Figures 3A and S2). The
mechanism responsible for these errors appears to be
PLoS Biology | www.plosbiology.org
2043
preferential nucleation of T-00 tiles on all-zero facets, due
to their higher concentration (Figure S3). If nucleation occurs
on an all-zero facet both above and below a ‘1’ tile, correct
growth from the ‘1’ will be sandwiched between ‘0’s and
therefore further errors will be forced (Figure 3B). The
further errors could be either (1) termination of the
Sierpinski triangle by addition of a mismatched ‘0’ tile at
the corner site, or (2) sideways propagation creating a new
small triangle by addition of a mismatched ‘1’ tile on the facet
below the corner site (arrow in Figure 3A). Thus, slight
quantitative variations in the model parameters can lead to
striking qualitative differences in the observed error mor-
phologies, which are effectively never seen under growth
conditions with equimolar tile concentrations or with
preformed borders (see Figure S2).
The kTAM also provides insights into a second kind of
error, the spontaneous 2D nucleation of untemplated crystals
in the absence of the nucleating structure. For
G
mc
2G
se
,
which corresponds to the melting temperature of the crystals,
such untemplated nucleation is inhibited by a kinetic
barrier—the existence of a critical nucleus size (Davey and
Garside 2000) that decreases with increasing supersaturation.
The growth rate of untemplated crystals also increases with
December 2004 | Volume 2 | Issue 12 | e424
Algorithmic Self-Assembly of DNA
Figure 2.
Typical kTAM Simulation Results
(A) A roughly 130
3
70 subregion of an error-free templated crystal.
(B) A subregion with 10 mismatch errors (0.1%), shown in red (both false ‘0’s and false ‘1’s). Grown at
G
mc
¼
17,
G
se
¼
8.8. Large all-zero patches
near the template row are due to intact Sierpinski pattern; for simulations with these parameters, asymptotically only approximately 1% of T-00
tiles are in all-zero patches containing more than 90 tiles (referred to as large patches).
(C) A subregion from a crystal grown with the T-00 and T-11 tiles at doubled concentration, on a slowly growing nucleating row. Mismatch errors
(43 of them, i.e., 0.3%, during growth at
G
mc
¼
17,
G
se
¼
8.6) characteristically terminate the Sierpinski pattern at corners. Asymptotically,
approximately 18% of T-00 tiles are in large patches.
(D) An untemplated crystal with roughly 4000 tiles and no errors. Inset: The largest subregion of a perfect Sierpinski pattern is small.
(E) An untemplated crystal with several errors, grown at
G
mc
¼
17,
G
se
¼
10.4. Note that growth in the reverse direction is more error-prone. Only
approximately 1% of T-00 tiles are in large patches.
(F) An untemplated crystal with few errors, grown at
G
mc
¼
17,
G
se
¼
8.6, with T-00 and T-11 at doubled stoichiometry. Note the large perfect
subregion. Simulation was initiated by a preformed seed larger than the critical nucleus size (roughly 100 tiles). For these simulation parameters,
approximately 25% of T-00 tiles are in large patches. According to the approximations used in Winfree (1998a),
G
mc
¼
17 corresponds to 0.8
lM,
G
se
¼
8.5 corresponds to 41.8
8C,
and
G
se
¼
10.4 corresponds to 32.7
8C.
The black outline around the crystals is for clarity; it does not represent
tiles.
DOI: 10.1371/journal.pbio.0020424.g002
supersaturation since their growth occurs by spontaneous 1D
nucleation of a new layer of tiles on any of four facets. Via any
single binding domain, there are always two tiles that can
bind, so such nucleation must effectively invent a new bit of
information. This bit may be propagated quickly forward or
sideways (wherein tiles attach by one input and one output
domain) to complete the layer without error according to the
logic of XOR. Consequently, such crystals have none of the
qualitative appearance of Sierpinski triangles even though
they may contain no mismatched tiles (see Figure 2D). If
G
se
is
increased, corresponding to lowering the temperature,
nucleation occurs more rapidly but errors are more frequent.
Backward growth, in which tiles attach to a crystal by both of
their output domains, is especially error-prone since every
one of these associations involves the invention of informa-
tion (Figure 3C). Whenever two backward-growing domains
meet and disagree on the information that they have
invented, growth can only proceed via an error. Under fast
growth conditions, signicantly below the melting temper-
ature as in Figure 2E, this gives rise to higher error rates in
the reverse growth direction. Near the melting temperature,
however, this effect is insignicant. The most noticeable
effect for untemplated crystals is due to the non-ideality
discussed above (doubling the relative concentration of T-00
PLoS Biology | www.plosbiology.org
2044
and T-11 tiles): the statistical preference for all-zero patches
actually
increases
the frequency and size of perfect Sierpinski
patterns (see Figure 2F).
These simulation studies suggest that all three difculties
(asynchronous association of tiles, mismatch errors, and
untemplated nucleation) in principle can be controlled by
slowing down the growth processes, making experimental
investigations the appropriate next step.
Design and Preparation of DNA Tiles
Abstract Wang tiles are implemented as DNA tiles
according to the scheme described earlier (Winfree et al.
1998a): each molecular Wang tile is a DNA double-crossover
molecule (Fu and Seeman 1993) with four sticky-ends (5-base
single-stranded overhangs) that serve as the programmable
binding domains. We rendered the four Sierpinski rule tiles
using two types of double-crossover molecule, known as
DAE-E and DAO-E molecules (Winfree 1996), resulting in two
independent molecular implementations (Figure 4, sequences
are as given in Figures S4–S7). The DAE-E Sierpinski tile set
(Figure 4A) consists of four molecular tiles, each composed of
ve strands whose sequences were designed to minimize the
potential for forming alternative structures (Seeman 1990), as
conrmed by non-denaturing gel electrophoresis (Figure S8).
December 2004 | Volume 2 | Issue 12 | e424
Algorithmic Self-Assembly of DNA
Figure 3.
Simulations with Slow Border
Growth and T-00 and T-11 at Doubled
Concentrations
(A) A common error pattern: termina-
tion of triangles at corners.
(B) An observed mechanism leading to
termination or sideways extension of
triangles: preferential nucleation of
T-00 on facets.
(C) Forward and sideways growth is
deterministic: at sites presenting two
binding domains, there is always a
unique tile that can form exactly two
bonds. Backward growth is non-deter-
ministic: at sites where both binding
domains agree (e.g., green arrows), there
are two possible tiles that can make two
bonds (either
fT-10,
T-01g, or
fT-00,
T-11g). At sites where the available
binding domains disagree (e.g., red ar-
rows), there is no tile that can associate
to form two bonds. Since only the output
type of tiles are shown, it is impossible to
tell from these gures which backward
growth sites present agreeing or dis-
agreeing binding domains.
DOI: 10.1371/journal.pbio.0020424.g003
Since untemplated crystals were not expected to produce
recognizable Sierpinski triangles, it was necessary to create a
proper nucleating structure to provide the initial input for
the algorithmic self-assembly. Previous work using DNA tiles
to self-assemble an initial boundary had proven to be difcult
(Schulman et al. 2004), so in this work we took an alternative
approach of using assembly PCR (Stemmer et al. 1995) to
create a long single-stranded molecule which could serve as a
scaffold (LaBean et al. 2000a; Yan et al. 2003a) for the
assembly of a row of input tiles (Figures 4A and S9–S11).
Because this nucleating strand serves as the bottom of these
tiles, only four strands are needed to assemble the input tiles,
and an additional capping strand is used to form a double-
helix between input tiles on the nucleating strand. By doping
the assembly PCR mix with a small fraction of the strands
coding for an input tile outputting a single ‘1,’ we ensure that
each nucleating structure contains a few randomly located
sites from which a Sierpinski triangle should grow.
The DAO-E Sierpinski tile set (Figure 4B) consists of six
molecular tiles, due to peculiarities of the DAO-E motif. First,
consideration of the 59 and 39 orientation of strands—
particularly the fact that the sticky ends at the top and
bottom of a DAO-E tile have opposite polarity—demands
that each tile binds only to
‘‘upside-down’’
neighbors,
resulting in layers of tiles with alternating orientation, which
we refer to here as R-type and S-type tiles. Furthermore, the
sugar–phosphate backbone of the DAO-E tiles has a dyad
symmetry axis, implying that the R-01 and S-01 tiles each can
play the roles of both the T-01 and T-10 tiles. Likewise, the
R-00, R-11, S-00, and S-11 tiles can each bind in two
orientations in a site where both inputs match.
PLoS Biology | www.plosbiology.org
2045
In order for the nucleating structure for the DAO-E lattice
to assemble onto a long PCR-generated nucleating strand, the
tiles on the input row must be of the DAE-O variant. Further,
we simplied the construction so that all nucleating strands
contain the same repetitive sequence, but the input tile
strands are doped with a fraction of strands containing a ‘1’
sticky-end, and again the nucleating structure contains a few
randomly located sites from which a Sierpinski triangle
should grow.
Self-Assembly of DNA Sierpinski Triangles
In principle, two approaches can be taken for initiating
algorithmic self-assembly of DNA tiles. In the preformed tile
approach, each tile is prepared separately by mixing a
stoichiometric amount of each component strand in the
hybridization buffer and then annealing from 90
8C
to room
temperature over the course of several hours. The nucleating
structure is similarly prepared by annealing the nucleating
strand with input tile and capping strands. Then the rule tiles
and nucleating structure are mixed together at a temperature
appropriate for crystal growth. In the bulk annealing
approach, the nucleating strand, the capping and input tile
strands, and the strands for all rule tiles are initially mixed
together and then annealed. Since, at the concentrations we
use, the tiles themselves have melting temperatures between
roughly 60
8C
and 70
8C
while the crystals have a melting
temperature within a few degrees of 40
8C
(Figure S12),
during annealing the tiles themselves will rst form, and only
later will the fully formed tiles assemble into crystals,
presumably growing from the nucleating structure prior to
overcoming the barrier to spontaneous nucleation. Both
approaches work, but because of the convenience of the bulk
December 2004 | Volume 2 | Issue 12 | e424
Zgłoś jeśli naruszono regulamin