Basics: How mass spectrometry is used to identify and characterise proteins

by | Mar 27, 2020 | Basics | 0 comments

There are a number of ways in which mass spectrometry can be used to identify and characterise proteins but the method used in Bio-MS is the most common method using fragmentation data from peptides. As well as scanning the masses and intensities of the ions present a mass spectrometer can also be used to select all the molecules of a specific ion, smash them against neutral gas molecules and to scan out the pieces produced in a process called fragmentation. This is the principle of tandem mass spectrometry and these fragmentation patterns are the basis data for identification and characterisation.
 
 As the fragmentation patterns from large, multiply charged ions can be very complex, difficult to interpret and of low intensity we need to ensure that the molecules used for identification are relatively small at less than 3kDa, preferably in the range 700Da-2kDa. This is why we have to enzymatically digest proteins into peptides to get them to approximately the right size. Using peptide information to infer protein information is often referred to as bottom-up proteomics. The most popular enzyme used is trypsin, which cleaves at arginines and lysines that happen to be distributed in most proteins at about the right spacing. Having an arginine and lysine at the terminal end also provides a strong positive charge at one end to helping the fragmentation pattern obtained to be clear and predictable. Peptides that are produced that are much bigger or smaller than the preferred range will either not be selected by the mass spectrometer or produce poor data and so those areas of the protein will not be matched or identified. Some peptides that are produced may also not be seen as they are insoluble or not ionise well so normally only about 40% of a protein is generally seen using a trypsin digest.
 
If you are interested in specific parts of a protein and that area is not made into a peptide of the right by trypsin then you can try other enzymes. Ideally we need enzymes to work as predicted but many enzymes do not work as specified. The enzymes Bio-MS have found to be most reliable are trypsin, lys C, arg C, asp N and glu C. The last two enzymes do not leave an argine or lysine at the end of a peptide so the quality of the fragmentation patterns are generally lower. If the mixture is relatively simple then you can use a low specificity enzyme such as elastase to produce a complicated mixture of over lapping peptides and although the quality of data and analysis is poorer than that produced using well defined enzymes intense data analysis can lead to much larger amounts of the protein being seen (>90%).
 
The first stage of analysing fragmentation data is to create a peak list files that summarises all the fragmentation spectra produced and links them to the mass of the peptide that was determined by the mass spectrometer before the fragmentation was performed. This peak list file can then be sent to a search engine for identification and the search engine used by Bio-MS is Mascot. The search engine takes databases of known protein sequences and calculates all the possible peptide sequences possible from the specified enzyme used and their masses. For each fragmented peptide in a peak list file the search engine then determines which theoretical peptides match the experimental mass of the fragmented peptide within the mass accuracy of the mass spectrometer. The search engine then creates a theoretical fragmentation spectrum for each of the potential candidates and compares the experimental fragmentation pattern to each one, providing each with a score. The more high scoring matches link back to the same protein sequence the more confident we are that that protein is present. The overall confidence of a protein match is related to both the number of different parts of the protein that had been matched and the quality of the match made. Using trypsin we have found that a single peptide match is suggestive, a two peptide match is probably and three or more different peptides identified from the same protein is a conclusive match. It is important to note that this process is based on pattern matching NOT peptide sequencing.
 
When looking for modifications such as phosphorylation the search engine can be told that it should consider the possibility of the modification on specific residues (S, T and Y for phosphorylation). The search engine will then also calculate all possible combinations of the specified modifications on each peptide and create a theoretical pattern for each one. If the pattern is matched and the fragmentation pattern shows data from where the modification is positioned then you can be confident that the peptide was modified and the modification is in the location specified. However if there is no fragmentation data in that region or it is not specific enough to distinguish between a number of alternative modified forms then the match can be specific in that the peptide was modified but not specific in where the modification occurred. Expert manual inspection of the pattern match may be required for the highest confidence in the result.
 
Due to the nature of pattern matching it is easy to have both false negatives and false positives in a dataset and so it is often useful to validate the matches using an independent process. Bio-MS uses a piece of software called Scaffold to both validate and distribute results produced within the facility as it can clear show the results from multiples of samples together in a clear and easy to understand way.

Suggested web links:

For more information on digestion enzymes click here
 
For more information on Mascot software click here
 
For more information on Scaffold software and lots of useful tutorials click here

Suggested Wikipedia links:

 Click on the  keywords: Proteomics,
 
Tags: #basics, proteomics

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *