samedi 28 mars 2015

Plasmids 101: Protein tags

Posted by Eric J. Perkins | Dec 11, 2014 11:26:08 AM
    
Plasmid-101-tags
Protein tags are usually smallish peptides incorporated into a translated protein. As depicted in the accompanying cartoon, they have a multitude of uses including (but not limited to) purification, detection, solubilization, localization, or protease protection. Thus far Plasmids 101 has covered GFP and its related fluorescent proteins, which are sometimes used as tags for detection; however, those are just one (admittedly large) class of common fusion protein tags. Biochemists and molecular biologists who need to overexpress and purify proteins can face any number of technical challenges depending on their protein of interest. After several decades of trying to address these challenges, researchers have amassed a considerable molecular tool box of tags and fusion proteins to aid in the expression and purification of recombinant proteins.

Tags for Stability and Solubility

What are some of the hurdles to overcome in order to overexpress a recombinant protein? It is not generally in a cell’s best interest to overexpress a protein. Energy and cellular resources are being spent to make something the cell doesn’t need to make. Eukaryotes and some bacteria deploy proteosomes to degrade what the cell might consider junk protein. Though there are a number of chemical and peptide-based proteosome inhibitors, glutathione S-transferase (GST), which can be fused to recombinant proteins for one-step purification with glutathione, can also protect against proteolysis.
That’s one form of instability. Prokaryotes can also have a hard time folding eukaryotic proteins. You can get your bacteria to produce massive amounts of protein, but if it’s not folded correctly, there’s no point in crystallizing it or testing its function. Small ubiquitin-related modifier (SUMO) can help with folding and stabilization, as can maltose-binding protein (MBP). Overexpression can also lead to insolubility, and aggregated protein is not useful protein. MBP tags can help with solubility issues, but scientists may also choose to add smaller proteins, such as Thioredoxin A (TrxA) that improve disulfide bond formation in order to help keep your protein soluble.

Tags for Affinity and Purification

An affinity tag, generally a relatively small sequence of amino acids, is basically a molecular leash for your protein. If you’re working with an uncharacterized protein, or a protein for which a good antibody has not been developed (and just because your protein has a commercially available antibody, that doesn’t mean it’s a good one), then your first step towards detecting, immunoprecipitating, or purifying that protein may be to fuse an affinity tag to it. The FLAG, hemaglutinin antigen (HA), and c-myc tags have been the workhorses of the affinity tag world for years, and deciding on which one to use will depend on your application (see table below). The antibodies available for these tags really are good and can be used for western blots, IP, and affinity purification.
Arguably the simplest affinity tag is the polyhistidine (His) tag. Small and unlikely to affect function, His-tagged proteins can be purified using metal-affinity chromatography, usually using a Ni2+ column. Like other affinity tags, a His tag can be fused to either the N- or C-terminus of a protein. Unlike other epitope tags – which when doubled or tripled increase the tag size quickly – modifying the length a polyhistidine tract does not greatly alter the size of the tag.

Table 1: Common protein tags

TagEpitopeMass (kDa)FunctionNotes
CBPKRRWKKNFIAVSAANRFKKISSSGAL4Affinity and PurificationBinding and elution steps use very moderate buffer conditions
FLAGDYKDDDD or DYKDDDDK or DYKDDDK1Affinity and PurificationGood for antibody-based purification; has inherent enterokinase cleavage site
GSTLarge Protein26Purification and StabilityGood for purification with glutathione; protects against proteolysis, but may reduce solubililty
HAYPYDVPDYA or YAYDVPDYA or YDVPDYASL 1.1AffinityFrequently used for western blots, IP, co-IP, IF, flow -cytometry; can occassionally interfere with protein folding
HBHHHHHHHAGKA GEGEIPAPLA GTVSKILVKE GDTVKAGQTV LVLEAMKMET EINAPTDGKV EKVLVKERDA VQGGQGLIKI GVHHHHHH 9ComboConsists of a bacterially derived in-vivo biotinylation signaling peptide (Bio), flanked by hexahistidine motifs (6xHis)
MBPLarge Protein 40Solubility and PurificationCan improve solulibility and folding of eukaryotic proteins in prokaryotes; single step purification with amylose, but wicked huge
MycEQKLISEEDL 1.2AffinityFrequently used for western blots, IP, co-IP, IF, flow -cytometry, but rarely used for purification as elution requires low pH
poly HisHHHHHH 0.8Affinity and PurificationVery small size, rarely affects function
S-tagKETAAAKFERQHMDS 1.8Solubility and AffinityAbundance of charged and polar residues improves solubility; good for antibody-based detection
SUMO~100 amino acid protein 12StabilityAt N-terminus, promotes folding and structural integrity; cleavable. Not great for purification; toocleavable in eukaryotes
TAPGRRIPGLINP WKRRWKKNFI AVSAANRFKK ISSSGALDYD IPTTASENLY FQGEFGLAQH DEAVDNKFNK EQQNAFYEIL HLPNLNEEQR NAFIQSLKDD PSQSANLLAE AKKLNDAQAP KVDNKFNKEQ QNAFYEILHL PNLNEEQRNA FIQSLKDDPS QSANLLAEAK KLNDAQAPKV DANHQ 21 ComboSee text 
TRXMSDKIIHLTD DSFDTDVLKA DGAILVDFWA EWCGPCKMIA PILDEIADEY QGKLTVAKLN IDQNPGTAPK YGIRGIPTLL LFKNGEVAAT KVGALSKGQL KEFLDANLAG SGSGHMHHHH HHSSGLVPRG 12SolubililtyAssists in proper folding
V5GKPIPNPLLGLDST  1.4Affinity and PurificationGood for antibody-based purification

Combo and Cleavage Tags

Frequently, a single tag is not enough. What if you need one tag to increase solubility and one tag for purification? Or you want to combine a fluorophore with a tag that localizes your protein to the nucleus? Or you want multiple rounds of purification to get your protein as pure as possible? Vectors that offer different combinations of tags are readily available, and though adding too many tags and fusion proteins to your protein of interest would eventually get ridiculous (you generally don’t want more tag than protein), 2-3 tags is increasingly common. Tandem affinity purification (TAP) once referred specifically to a combo tag comprised of a calmodulin binding peptide (CBP), a TEV cleavage site (more on that in a moment), and 2 ProtA IgG-binding domains. TAP has since come to encompass several other tag combinations, though frequently those combinations still include at least one element from the original TAP tag. The terms dual-labeling and dual-tagging are also used. Due to their small size and the ease with which they can be added to a purification scheme, His tags are frequently combined with other tags for dual-labeling.
The problem with all these tags is that many of them serve a one-time purpose, and you don’t necessarily want them to stick around after that purpose has been served. At this point, proteases can be your friend rather your enemy. Two common tags (SUMO and FLAG) are cleaved by specific proteases without requiring the addition of an independent cleavage recognition site. In fact, SUMO cannot be used in eukaryotes because there is already too much SUMO protease around, but it is convenient when used with purified protein since the enzyme cleaves the SUMO tag in the same manner as it would have in the context of a cell. FLAG tags can be cleaved by enterokinase, which recognizes DDDDK^X, cleaving after the lysine. The efficiency of this cleavage depends on the identity of X.
A number of other proteases are available, but scientists would need to incorporate their recognition sites into their protein tag in order to use them effectively. One of the best optimized is the tobacco etch virus (TEV) protease. A TEV protease cleavage site is frequently placed between two tags being used for two rounds of purification, with the cleavage reaction taking place between column runs. The TEV protease itself, with various mutations used to increase its stability activity, can be readily purified using plasmids found in this paper (available at Addgene).

Table 2: Protease recognition sites commonly used with tags

ProteaseRecognition siteNotes
TEVENLYFQSCleaves between the Gln and Ser residues
ThrombinLVPRGSCleaves between Arg and Gly residues
PreScissionLEVLFQGPCleaves between the Gln and Gly residues

This article is not a comprehensive guide to all tags, but rather a quick overview of why scientists use tags, with a few time-tested tags and fusion proteins as examples. The tables list more common tags than are described in the post, but have been categoriezed to help you better assess their function. More detailed information and some protocols can be found in the references provided.

mardi 10 mars 2015

E. coli Strains for Protein Expression

Many challenges can arise when over-expressing a foreign protein in E. coli. We will review the potential pitfalls of recombinant protein expression and some of the most popular commercial strains designed to avoid them.
Why do I need an expression strain?
Protein expression from high-copy number plasmids and powerful promoters will greatly exceed that of any native host protein, using up valuable resources in the cell thus leading to slowed growth. Additionally, some protein products may be toxic to the host when expressed, particularly those that are insoluble, act on DNA, or are enzymatically active. For this reason, recombinant proteins are typically expressed in E. coli engineered to accomodate high protein loads using inducible promoter systems (which will be discussed later). In addition to the basic genotypes outlined below, certain specialized strains are available to confer greater transcriptional control, assist with proper protein folding, and deal with sub-optimal codon usage (Table 1)
A few mutations are common to all or most expression strains to accomodate high protein levels including: 
  • ompT: Strains harboring this mutation are deficient in outer membrane protease VII, which reduces proteolysis of the expressed recombinant proteins.
  • lon protease: Strains where this is completely deleted (designated lon or Δlon) similary reduce proteolysis of the expressed proteins.
  • hsdSB (rB- mB-): These strains have an inactivated native restriction/methylation system. This means the strain can neither restrict nor methylate DNA.
  • dcm: Similarly, strains with this mutation are unable to methylate cytosine within a particular sequence.
Table 1: E. coli Expression Strains 
Note: All strains are derived from the E. coli B strain, except ** which are K12
Strain
Resistance
Key Features
Genotype
Use
BL21 (DE3)

Basic IPTG-inducible strain containing T7 RNAP (DE3)
F- ompT lon hsdSB(rB- mB-) gal dcm (DE3)
General protein expression
BL21 (DE3) pLysS*
Chloramphenicol (pLysS)
pLysS expresses T7 lysozyme to reduce basal expression levels; expression vector cannot have p15A origin of replication
F- ompT lon hsdSB(rB- mB-) gal dcm(DE3) pLysS (CamR)
Expression of toxic proteins
BL21 (DE3) pLysE*
Chloramphenicol (pLysE)
pLysE has higher T7 lysozyme expression than pLysS; expression vector cannot have p15A origin of replication
F- ompT lon hsdSB(rB- mB-) gal dcm(DE3) pLysE (CamR)
Expression of toxic proteins
BL21 star (DE3)

Lacks functional RNaseE which results in longer transcript half-life
F- ompT lon hsdSB(rB- mB-) gal dcm rne131 (DE3)
General expression; not recommended for toxic proteins
BL21-A1
Tetracycline
Arabinose-inducible expression of T7 RNAP; IPTG may still be required for expression
F- ompT lon hsdSB(rB- mB-) gal dcm araB::T7RNAP-tetA
General protein expression 
BLR (DE3)
Tetracycline
RecA-deficient; best for plasmids with repetative sequences. 
F- ompT lon hsdSB(rB- mB-) gal dcm(DE3) Δ(srl-recA)306::Tn10 (TetR)
Expression of unstable proteins 
HMS174 (DE3)**
Rifampicin
RecA-deficient; allows for cloning and expression in same strain
F- recA1 hsdR(rK12- mK12+) (DE3) (RifR)
Expression of unstable proteins
Tuner (DE3)

Contains mutated lac permease whch allows for linear control of expression
F- ompT lon hsdSB(rB- mB-) gal dcm lacY1(DE3)
Expression of toxic or insoluble proteins
Origami2 (DE3)**
Streptomycin and Tetracycline
Contains highly active thioredoxin reductase and glutathione reductase to faciliate proper folding; may increase multimer formation
Δ(ara-leu)7697 ΔlacX74 ΔphoA PvuII phoR araD139 ahpC galE galK rpsL F′[lac+ lacIq pro] (DE3) gor522::Tn10 trxB (StrR, TetR)
Expression of insoluble proteins 
Rosetta2 (DE3)*
Chloramphenicol (pRARE)
Good for “universal” translation; contains 7 additional tRNAs for rare codons not normally used in E. coli.Expression vector cannot have p15A origin of replication
F- ompT hsdSB(rB- mB-) gal dcm (DE3) pRARE2 (CamR)
Expression of eukaryotic proteins
Lemo21 (DE3)*
Chloramphenicol (pLemo)
Rhamnose-tunable T7 RNAP expression alleviates inclusion body formation. Expression vector cannot have p15A origin of replication
fhuA2 [lon] ompT gal (λ DE3) [dcm] ∆hsdS/ pLemo (CamR)
Expression of toxic, insoluble, or membrane proteins 
T7 Express

IPTG-inducible expression of T7 RNAP from the genome; does not restrict methylated DNA
fhuA2 lacZ::T7 gene1 [lon] ompT gal sulA11 R(mcr-73::miniTn10--TetS)2 [dcm] R(zgb-210::Tn10--TetS)
General protein expression 
m15 pREP4*, **
Kanamycin (pREP4)
Cis-repression of the E. coli T5 promoter (found on vectors such as pQE or similar), inducible under IPTG (lac repressor on the pREP4 plasmid). Expression vector cannot have p15A origin of replication
F-, Φ80ΔlacM15, thi, lac-, mtl-, recA+, KmR
Expression of toxic proteins 
* Denotes the presence of an additional plasmid-- make sure to maintain this by growing on appropriate media. Note: Purifying your expression plasmid from these strains is not recommended as these auxillary plasmids may be isolated during the prepping process.
How does inducible expression work?
As mentioned above, many expression plasmids utilize inducible promoters, which are 'inactive' until an inducer such as IPTG is added to the growth medium. Induction timing is important, as you typically want to make sure your cells have first reached an appropriate density. Cells in the exponential growth phase are alive and healthy, which makes them ideal for protein expression. If you wait too long to induce, your culture will start collecting dead cells, and, conversely, you cannot induce too early as there are not enough cells in the culture to make protein. 
The DE3 lysogen/T7 promoter combination is the most popular induction system. The DE3 lysogen expresses T7 RNA polymerase (RNAP) from the bacterial genome under control of the lac repressor, which is inducible by the addition of IPTG. T7 RNAP is then available to transcribe the gene of interest from a T7 promoter on the plasmid. Many commercial strains carry the DE3 lysogen, as indicated by the name of the strain. Conversely, other strains such as M15(pREP4) use a lac repressor to act directly on the expression plasmid in order to repress transcription from a hybrid promoter.
Although the DE3/T7 RNAP system works well for most experiments, the lac promoter can “leak,” meaning that a low level of expression exists even without the addition of IPTG. This is mostly a problem for toxic protein products, which can prevent the culture from reaching the desired density within a reasonable time-frame. For these cases, some strains carry an additional measure of control such as the pLys plasmid, which suppresses basal T7 expression. The pLys plasmid contains a chloramphenicol resistance cassette for positive selection and a p15A origin of replication, making it incompatible with other p15A plasmids. pLys comes in two flavors—pLysS and pLysE—the difference being that the latter provides tighter control of basal expression.
What if I don't see protein overexpression?
The strains described above should generate sufficient expression levels for most purposes, but what do you do when you’ve tried a common strain and don’t get the desired level (or any) protein expression? Low expression outcomes can result from variety of sources, so fear not—there are a few simple troubleshooting measures that can help get you back on track:

  • Compatibility: Double-check your plasmid backbone and expression strain to make sure they are compatible. An arabinose-inducible plasmid will not express in an IPTG induction strain for example, nor will a p15 plasmid be compatible with a pLys strain. Your strain may require additional antibiotic selection or a special growth media, or if your plasmid is low-copy, consider reducing the antibiotic concentration.
  • Growth Tempurature: Analyze your expression conditions by setting up a small-scale expression experiment to test variables such as temperature, time, and media conditions. Many recombinant proteins express better at 30°C or room-temperature, which is accomplished by growing your culture to the desired density at 37°C and reducing the temperature or moving it to a bench-top shaker 10-20 minutes before adding the inducer.
  • Growth Media: Changing media is tricky, because there can be a trade-off between growth rate and protein quality. For many proteins, a rich media such as TB or 2XYT is optimal because of the high cell-density they support; however, minimal media supplemented with M9 salts may be preferable if the protein product is secreted to the medium or if slow expression is required due to solubility concerns.
  • Insoluble and Secreted Proteins: The most common purification protocols are designed for soluble, cystosolic protein products, but this is not always achievable. Proteins which contain hydrophobic regions or multiple disulfide bonds may aggregate and become insoluble. These insoluble globs of misfolded protein are known as inclusion bodies, and can be recovered and purified using a special protocol. Alternatively, reducing the concentration of inducer or adding anaffinity tag such as GST may help with solubility issues.