Shared Flashcard Set

Details

Bioinfo I final
Questions in course guide
90
Other
Undergraduate 4
12/02/2012

Additional Other Flashcards

 


 

Cards

Term
Which NCBI database archives, distributes and supports submission of data that correlate genomic characteristics with observable traits and is a designated NIH repository for genome-wide association study (GWAS) results? A) dbGaP B)dbSNP C)dbVAR D) OMIM
Definition
dbGAP
Term

Generating protein alignments can generally be more informative than DNA alignments because:

A) There are 20 AA vs. 4 bases and many AA share related biophysical properties

B) Codons are degenerate and many DNA mutations do not alter the AA

C) Protein sequences offer a longer "look-back" time

D) All of the above

Definition
All of the above
Term

Orthologs are defined as:

A) Homologous sequences in different species that share an ancestral gene

B) Homologous sequences that share little AA identity but share great structural similarity

C) Homologous sequences in the same species that arose through duplication

D) Homologous sequences in the same species which have similar and often redundant functions

Definition
Homologous sequences in different species that share an ancestral gene.
Term

The extent to which two nucleotide or protein sequences are related is defined as:

A) Identity

B) Similarity

C) Conservation

D) Homology

Definition
Similarity (identity + conservation)
Term

Which of the following AA is least mutable according to the PAM scoring matrix?

A) Alanine

B) Glutathione

C) Methionine

D) Cysteine

Definition
Cysteine
Term

Affine gaps refer to:

A) System of assessing gap penalties where presence of a gap is assigned more significance than the length of the gap.

B) System of gap penalties whereby gaps are deleted in pairwise comparisions between sequences

C) System of gap penalties whereby gaps are adjusted according to pairwise comparisons in MSA

D) System of gap penalties whereby presence of a gap is assigned the same score as the length of the gap

Definition
System of assessing gap penalties where presence of a gap is assigned more significance than the length of the gap.
Term

The PAM250 matrix is defined as having an evolutionary divergence in which what percentage of amino acids between two homologous sequences have changed over time?

A) 1%

B) 20%

C) 80%

D) 250%

Definition
80%
Term

PAM Matrices:

A) Are based upon global alignemnts of closely related proteins

B) Was originally derived from 34 protein superfamilies

C) Define the score of two aligned residues i,j as 10 times the log of how likely it is to observe these two residues divided by the background probability of finding these AA by chance.

D) All of the above

Definition
All of the above.
Term

Which of the following sentences best descrives the difference between a global alignment and a local alignment between two sequences?

A) Global alignment is usually used for DNA, while local alignment is usually used for protein sequences.

B) Global alignmnet has gaps, while local alignments does not have gaps

C) Global alignment finds the global max, while the local alignment finds the local max.

D) Global alignment aligns the whole sequence, while local alignment finds the best sub-sequence that aligns

Definition
Global alignment aligns the whole sequence while local alignment finds the best sub-sequence that aligns
Term

In Dayhoff's original protein superfamilies, which were used to compute the PAM matrix, which protein family had the lowest rate of point-accepted-mutations per 100 million years?

A) Ig Kappa

B) Ubiquitin

C) Lysozyme

D) Insulin

Definition
Ubiquitin
Term

You have two distantly related proteins. Which BLOSUM or PAM is best to use to compare them?

A) BLOSUM 45 and PAM 250

B) BLOSUM 45 and PAM 10

C) BLOSUM 80 and PAM 250

D) BLOSUM 80 and PAM 10

Definition
BLOSUM 45 and PAM 250
Term

A fundamental difference between PAM and BLOSUM matrices is that:

A) BLOSUM matrices are based on local alignments while PAMs are derived from global alignments

B) BLOSUM matrices are based on global alignments while PAMS are derived on local alignments

C) BLOSUM cannot be used for aligning very distantly related proteins (<50%) whereas PAMS are best used when aligning distantly related proteins

D) BLOSUM cannot be used in generating MSAs whereas PAMS are best used when generating MSAs.

Definition
BLOSUM matrices are based on local alignments while PAMs are dervived from global alignments
Term

Two proteins that share 30% AA identity are 30% homologous:

A) true

B) false

Definition
False
Term

You have a reasonably short, typical, dsDNA sequence. Basically, how many proteins can it potentially encode?

A)2

B)1

C)3

D)6

Definition
6
Term

You have a typical stretch of 300 NT long DNA sequence containing both the start and stop codons. Upon translating the sequence how many AA should you see?

A) 100

B) 99

C) 98

D) 150

Definition
99
Term

You have a DNA sequence. You want to know which protein in the main protein database (Nr , in the nonredundant database) is most similar to some protein encoded by your DNA. Which program should you use?

A) BlastN

B) Blast P

C) BlastX

D) tBlastN

Definition
BlastX
Term

Which output from a BLAST search provides an estimate of the number of false positives?

A) E value

B) Bit score

C) Percent identity

D) Percent positives

Definition
E value
Term

You can limit a BLAST search using any Entrez term. For example you can limit the results to those containg a specific researcher's name:

A) TRUE

B) FALSE

Definition
TRUE
Term

An extreme value distribution:

A) Descibes the distribution of scores from a query against a database

B) Has a larger total area than a normal distribution

C) Is symmetric

D) Has a shape that is described by two constants (the means and a decay constant)

Definition
Describes the distribution of scores from a query against a database 
Term

As the E value of a BLAST search becomes smaller:

A) The value K ( a searcg space paramenter) also becomes smaller

B) the score tends to be larger

C) the probability P tends to be larger

D) the extreme value distribution becomes less skewed

Definition
The score tends to be larger
Term

The BLAST algorithm complies a list of "Words" typically of three AA (for protein search). Words at or above a threshold value T are defined as:

A) "Hits" and are used to scan a database for exact matches that may then be extended

B)"Hits" and are used to scan a database for exact  or partial matches that may then be extended

C) Hits and are aligned to each other

D) Hits and are reported as raw scores

Definition
B)"Hits" and are used to scan a database for exact  or partial matches that may then be extended
Term

If you want literature info, what is the best website to visit?

A) OMIM

B) Entrez

C) PubMed

D) PROSITE

Definition
PubMed 
Term

Which of the following methods allows incorporation of 3-D protein within a multiple sequence alignment?

A) CLUSTALW

B) MUSCLE

C) EXPRESSO

D) MAFFT

Definition
EXPRESSO
Term

Benchmarking refers to:

A) Making a set of MSAs from closely related proteins that form a trusted alignment

B) Making a set of MSAs from proteins that have had their teriary structure determined, thus allowing the MSA to be validated based on structural criteria

C) Making a set of MSAs, with an algorithm, that are subsequently employed to refine tertiary structure predictions

D) Making a set of MSAs from proteins that are known, bases on structural criteria, to be members of distinct protein families.

Definition

Making a set of MSAs from proteins that have had their tertiary structire determined, thus allowing the MSA to be validated based on structural criteria.

Term

Why doesnt CLUSTALW (a program that employs Feng and Doolittle progressive sequence alignment algorithm) report expect values?

A) CLUSTALW does report expect values

B) CLUSTALW uses global alignments for which E value statistics are not available

C) CLUSTALW uses local alignments for which E value stats are not available

D) E value stats are not relevant to MSAs.

Definition
CLUSTALW uses global alignments for which E value statistics are not available
Term

The "once a gap, always a gap" rule for Feng-Doolittle method:

A) Assures that gaps will not be filled in inappropriately with inserted sequences

B) Assures that sequences that diverged early in evolution will be given priority in establishing the order in which MSA is constructed

C) Assures that gaps occuring between sequences that are most closely related in MSA will be preserved

D) Assures that gaps occuring between sequences that distantly related will be maintained in the MSA.

Definition
C) Assures that gaps occuring between sequences that are most closely related in MSA will be preserved
Term

How can MSA programs improve performance?

A) By doing PSI-BLast

B) By incorporating data on secondary structure

C) By incorporating data on 3D structure

D) All of the above

Definition
All of the above
Term

What is the main strength of consistency based approaches (such as Probcons or T-Coffee)

A) They include info based on position-specific scoring matrices

B) They include info based on 3D protein structures, typically obtained from X-Ray crystallography

C) They perform profile-profile alignments and are extremely fast

D) They include info based on MSAs to guide the determination of pairwise alignments

Definition
They include info based on MSAs to guide the determination of pairwise alignments
Term

If you perform a MSA of a group of proteins and include a distantly related protein ( a divergent member called an "orphan"):

A) The orphan is typically aligned with the group of proteins

B) The orphan is not typically aligned with the group of proteins

Definition
The orphan is typically aligned with the group of proteins
Term

The main difference between Pfam-A and Pfam-B is that:

A) Pfam-A is manually curated while Pfam-B is automatically curated

B) Pfam-A uses Hidden Markov models while Pfam-B does not

C) Pfam-A provides full length protein alignments while Pfam-B aligns protein fragments

D) Pfam-A incorporates data from SMART and PROSITE while Pfam-B does not.

Definition
 Pfam-A is manually curated while Pfam-B is automatically curated
Term

What is the feature of algorithms that align large tracts of genomic DNA, in contrast to programs such as CLUSTALW that align smaller blocks of DNA or proteins?

A) They are generally unable to align DNA from organisms that are highly divergent, such as those speciated several hundred million years ago

B) They generally use progressive alignment and so are fundamentally similar

C) They often employ anchors that help to align regions of conservation that interspered with less conserved regions ( such as those arising in the noncoding regions, deleted regions, or inverted regions )

D) They are specialized to accept very long inputs

Definition
They are generally unable to align DNA from organisms that are highly divergent, such as those speciated several hundred million years ago
Term

Applying which of the following can improve the ability of CLUSTALW to perform MSA?

A) Assign individual weights to sequences based on their divergence levels: closely related sequences, less wt, distantly related sequences, more wt.

B) Vary scoring matrices depending on the presence of conserved or divergent sequences

C) Apply reside-specific gap-penalties

D) All of the above

Definition
All of the above
Term

ProbCons:

 

a) Combines iterative and progressive approaches with unique probabilistic model.

b) Uses hidden markov Model to calculate probability matrices for matching residues, uses this to construct a guide tree.

c) Performs progressive alignment hierarchically along guide tree

d) Performs post-processing and iterative refinement (a little like MUSCLE)

e) All of the above

Definition
All of the above
Term

Which of the following methods allows incorporation of 3-D protein structure within a multiple sequence alignment?

A) CLUSTALW

B) MUSCLE

C) EXPRESSO

D) MAFFT

Definition
EXPRESSO
Term

Effective "sequence search space" during a BLAST search refers to:

A) The product of effective query length and effective database length

B) The ratio of effective query length and database size

C) The size of the database searched

D) The size of the query itself

Definition
The product of effective query length and effective database length
Term

Changing which of the following BLAST parameters would tend to yield fewer search results?

A) Turning off the low-complexity filter

B) Changing the expect value from from 1 to 10

C) Raising the threshold value

D) Changing the scoring matrix from PAM30 to PAM70

Definition
Raising the threshold value
Term

In a BLAST search the E value is:

A) the number of alignments with scores greater than or equal to score S that 

Definition
Term

Applying which of the following can improve the ability of CLUSTALW to perform multiple sequence alignments?

A) Assign individual weights to sequences based on their divergent levels: closely related sequences, less wt; distantly related sequences, more wt.

B) Varying scoring matrices depending on the presence of conserved or divergent sequences

C) Apply residue-specific gap-penalties

D) All of the above

Definition
All of the above
Term

A major difference between progressive alignment algorithm as implemented in T-Coffee versus as implemented in CLUSTALW is:

A) T-Coffee progressive alignment phase uses a dynamic algorithm with gap-opening penalties and gap extension penalties set to zero for aligning two sequences or two groups of pre-aligned sequences

B) T coffee progressive alignment phase uses a dynamic algorith with equal gap-opening and gap extension penalties set to 1

C) Whereas T-Coffee implements a hard version of introducing gaps, whereby once introduced between a pair of aligned sequences they cannot be shifted, CLUSTALW is much more flexible in introducing and then shifting gaps

D) Both T-Coffee and CLUSTALW implement the same version of dynamic progression alignment algorithm

Definition
 T-Coffee progressive alignment phase uses a dynamic algorithm with gap-opening penalties and gap extension penalties set to zero for aligning two sequences or two groups of pre-aligned sequences
Term

If you were aligning a group of sequences that shared 80-100% sequence identity, which of the following PAM scoring matrices would you use for constructing a MSA using CLUSTALW?

A) PAM 350

B) PAM 120

C) PAM 20

D) PAM 35

Definition
PAM 20
Term

According to the molecular clock hypothesis:

A) All proteins evolve at the same, constant rate

B) All proteins evolve at a rate that matches the fossil record

C) For every given protein, the rate of molecular evolution gradually slows down like a clock that runs down

D) For every given protein, the rate of molecular evolution  is approximately constant in all evolutionary lineages

Definition
For every given protein, the rate of molecular evolution  is approximately constant in all evolutionary lineages
Term

The two main features of any phylogenetic tree are:

A) The clades and the nodes

B) The topology and the branch lengths

C) The clades and the roots

D) The alignment and the bootstrap

Definition
The topology and the branch lengths
Term

Which of the following is a character-based phylogenetic algorithm?

A) Neighbor-joining

B) Kimura

C) Maximum likelihood

D) PAUP

Definition
Maximum likelihood
Term

Two basic ways to make a phylogenetic tree are distance based and character based. A fundamental difference between them is:

A) Distance-based methods essentially summarize relatedness across the length of protein or DNA sequences while character based methods do not

B) Distance based methods are only used for DNA data while character-based methods are used for DNA or protein data

C) Distance based methods use parsimony while character-based methods do not

D) Distance based methods have branches that are proportional to time while character-based methods do not.

Definition
Distance-based methods essentially summarize relatedness across the length of protein or DNA sequences while character based methods do not
Term

An example of an operational taxonomic unit (OTU) is:

A) Multiple sequence alignment

B) Protein sequence

C) Clade

D) Node

Definition
Protein sequence
Term

For a given pair of OTUs, which of the following is true?

A) The corrected genetic distance is greater than or equal to the proportion of substitutions

B) The proportion of substitutions is greater than or equal to the corrected genetic distance.

Definition
 The corrected genetic distance is greater than or equal to the proportion of substitutions
Term

Transitions are almost always weighted more heavily than transversions

A) True

B) False

Definition
False
Term

One of the most common errors in making and analyzing a phylogenetic tree is:

A) Using a bad MSA as input

B) Trying to infer the evolutionary relationships of genes (or proteins) in the tree

C) Trying to infer the age at which genes (or proteins) diverged from each other

D) Assuming that clades are monophyletic.

Definition
 Using a bad MSA as input
Term

You have 200 viral DNA sequences of 500 residues each, and you want to know if there are any pairs that are identical (or nearly identical). Which of the following is the most efficient method to use?

A) BLAST

B) Maximum-likelihood phylogenetic tree analysis

C) Neighbor-joining phylogenetic tree analysis

D) Popset

Definition
Neighbor-joining phylogenetic tree analysis
Term

Evaluate the following statement: "Even within genes that undergo rapid evolution or sequence diversification, certain regions may still be subject to intense purifying selection"

A) TRUE

B) FALSE

Definition
TRUE
Term

The selective constraints on a gene can be studied given a multiple sequence alignment. Which of the following relationships between silent and non-silent mutations is suggestive of positive Darwinian selection?

A) dN/dS<<1

B) dS/dN>>1

C) dN/ dS >> 1

D) dN - dS =1

Definition
dN/ dS >>1
Term

Random fluctuation in allelic frequency that results solely from chance events is called:

 

A) Natural selection

B) Migration

C) Genetic Drift

D) Mutational load

Definition
Genetic Drift
Term

Which of the following types of population-based events would result in dramatic, but nevertheless random fluctuation in allele frequencies?

A) Founder effect

B) Population bottleneck

C) Gene flow/ migration

D) All of the above

Definition
All of the above
Term

Phylogenenies are typically constructed assuming which of the following conditions?

A) Action of positive Darwinian selection

B) Mutations with Free recombination

C) Mutations with no recombination

D) Robust phylogenetic trees can be reconstructed so long as you have a robust MSA

Definition
Robust phylogenetic trees can be reconstructed so long as you have a robust MSA
Term

Evaluate the statement: "Variation + Differential reproduction + Hereditary = Natural Selection!"

A) TRUE

B) FALSE

C) It is not as simple as that; parts of the statement are true, although it omits several other criteria necessary to define natural selection.

Definition
TRUE
Term

Evaluate the statement: 'At the molecular level, evolution is a process of mutation and selection".

A) TRUE

B) FALSE

Definition
TRUE
Term

You have a favorite gene, and you want to determine in what tissues it is expressed. Which of the following resources is likely the most direct route to this information?

A) Unigene

B) Entrez

C) Pubmed

D) PCR

Definition
Unigene
Term

Which of the following databases is derived from mRNA information?

A) dbEST

B) PDB

C) OMIM

D) HTGS

Definition
dbEST
Term

Which of the following databases can be used to access text information about human diseases?

A) EST

B) PBD

C) OMIM

D) HTGS

Definition
OMIM
Term

What is the difference between RefSeq and GenBank?

A) RefSeq includes publicly available DNA sequences submitted from individual laboratories and sequencing projects

B) GenBank provides nonredundant curated data

C) GenBank sequences are derived from RefSeq

D) RefSeq sequences are derived from GenBank and provide nonredundant curated data

Definition
RefSeq sequences are derived from GenBank and provide nonredundant curated data
Term

If you wnat literature info, what is the best website to visit?

A) OMIM

B) Entrez

C) PubMed

D) PROSITE

Definition
PubMed
Term

Compare the use if Entrez and ExPASy to retrieve information about a protein sequence.

A) Entrez is likely to yield a more comprehensive search because GenBank has more data than EMBL

B) The search results are likely to be identical because the underlying raw data from GenBank and EMBL are the same.

C) The search results are likely to be comparable, but the SwissProt record from ExPASy will offer a different output format with distinct kinds of information.

D) None of the above

Definition
The search results are likely to be comparable, but the SwissProt record from ExPASy will offer a different output format with distinct kinds of information.
Term

My NCBI allows to:

A) Saving search queries and results

B) seeting up automatic searches with email alerts

C) Storing and organizing NCBI database records

D) tracking recent usage history

E) All of the above.

Definition
All of the above.
Term

Which of the following databases alows for search and retrieval of collections of related sequences and alignments derived from population, phylogenetic, mutation and ecosystem studies that have been submitted to GenBank.

A) BioProject

B) BioSample

C) Popset

D) DbVar

Definition
Popset
Term

Which of the following databases serves as a data repository and retrieval system for high-throughput functional genomic data generated by microarray and next-generation sequencing technologies?

A) dbSNP

B) dbVar

C) UniGene

D) GEO

Definition
GEO
Term

Which of the following NCBI tools automatically detects homologs, including paralogs and orthologs, among the genes of 20 completely sequenced eukaryotic genomes.

A) UniGene

B) HomoloGene

C) Probe

D) BioSystems

Definition
HomoloGene
Term

PAM Matrices:

A) Are based upon global alignments of closely related proteins.

B) Was originally derived from 34 protein superfamilies

C) Define the score of two aligned residues i,j as 10 times the log of how likely it is to observe these two residues divided by the background probability of finding these AA by chance.

D) All of the above

Definition
All of the above
Term

Which of the following sentences best descrives the difference between a global alignment and a local alignment between two sequences?

A) Global alignment is usually used for DNA sequences, while local alignment is usually used for protein sequences

B) Global alignment has gaps, while local alignments does not

C) Global alignmnet finds the global maximum, while the local alignment finds the local max

D) Global alignment aligns the whole sequence, while local alignment that finds the best subsequence that aligns.

Definition
 Global alignment aligns the whole sequence, while local alignment that finds the best subsequence that aligns.
Term

A global algorithm (such as the Needleman-Wunsch algorithm) is guaranteed to find an optimal alignment. Such an algorithm:

A) Puts the 2 proteins being compared into a matrix and finds the optimal score by exhaustively searching every possible combination of alignments

B) Puts the two proteins being compared into a matrix and finds the optimal score by iterative recursions

C) Puts the two proteins being compared into a matrix and finds the optimal alignment by finding optimal subpaths that define the nest alignment

D) Can be used for proteins but not for DNA sequences

Definition
Puts the two proteins being compared into a matrix and finds the optimal alignment by finding optimal subpats that define the nest alignment
Term

In a database search or in a pairwise alignment, sensitivity is defined as:

A) The ability of a search algorithm to find true positives (ie homologous sequences) and to avoid false positives (ie unrelated sequences having high similarity scores)

B) The ability of a search algorithm to find true positives (ie homologous sequences) and to avoid false positives (ie hologous sequences that are not reported)

C) The ability of a search algorithm to find true positives (ie homologous sequences) and to avoid false negatives ( ie unrelated sequences having high similarity)

D) The ability of a search algorithm to find true positives (Ie homologous sequences) and to avoid false negatives ( ie homologous sequences that are not reported)

Definition
The ability of a search algorithm to find true positives (Ie homologous sequences) and to avoid false negatives ( ie homologous sequences that are not reported)
Term

You have a DNA sequence that is 25KB long and you want to search against an organism specific genome database. What should be your optimum "word-size" setting if you are performing a BlastN

A) 2

B) 16-256

C) 1-2

D) 2-10

Definition
16-256
Term

Changing which of the following options can help in reducing false positives furing a BlastP search:

A) Increasing the E value

B) Increaseing the Word size

C) Applying condition or universal composition score adjustment

D) All of the above

Definition
Applying condition or universal composition score adjustment
Term

"Effective sequence search space" during a BLAST search refers to:

A) The product of effective query length and effective database length

B) The ratio of effective query length and database size

C) The size of the database searched

D) The size of the query itself.

Definition
The product of effective query length and effective database length
Term

Changing which of the following BLAST parameters would tend to yield fewer search results?

A) Turning off the low-complexity filter

B) Changing the expect value from 1 to 10

C) Raising the threshold value

D) Changing the scoring matrix from PAM 30 to PAM 70

Definition
Raising the threshold value
Term

In a BLAST search the E Value is:

A) the number of alignments with scores greater than or equal to score S that are expected to occur by chance in a database search

B) Is related to a probability value, P= 1-e^E

C) Extremely low when the alignment is very good

D) All of the above

Definition
 All of the above
Term

A fundamental difference between the raw score and "bit" score output in BLAST search is:

A) Bit scores are comparable between different searches because they are normalized to account for the use of different scoring matrices and different database sizes

B) Raw scores are comparable between different searches because they are normalized to account for the use of different scoring matrices and different database sizes

C) Bit scores are calculated from the substitution matrix

D) Raw score can be adjusted whereas the bit score is always fixed

Definition
Bit scores are comparable between different searches because they are normalized to account for the use of different scoring matrices and different database sizes
Term

A PSI-BLAST search is most useful when you want to do the following:

A) Find the rat ortholog of a human protein

B) Extend a database search to find additional proteins

C) Extend a database search to find additional DAN sequences

D) Use a pattern or signature to extend a protein search.

Definition
Extend a database search to find additional proteins
Term

A type phylogenetic "Clade" that includes teh common ancestor but not all of the ancestor's descendants is called as:

A) Monophyletic

B) Polyphyletic

C) Polytomic

D) Paraphyletic

Definition
Paraphyletic
Term

Which of the following BLAST programs uses a signature of amino acids to find protein within a family?

A) PSI-BLAST

B) PHI-BLAST

C) MS_BLAST

D) WormBLAST

Definition
PHI BLAST
Term

In a position-specific scoring matrix, the column headings can have 20 AA, and the rows can represent the residues of a query sequence. Within the matrix, the score for any give AA residue is assigned based on:

A) A PAM or BLOSUM matrix

B) Its frequency of occurence in an MSA

C) Its background frequency of occurrence

D) The core of its neighboring amino acids

Definition
Its frequency of occurence in an MSA
Term

As part of a PSI-BAST search, a score is assigned to alignment between a query sequence and a database match over some length (such as 50 AA residues). It is possible for this pairwise alignment to receive a higher or lower score over successive PSI-BLAST iterations, even though there is no change in which amino acid residues are aligned.

A) TRUE

B) FALSE

Definition
TRUE
Term

A position-specific scoring matrix is said to be "corrupted" when it incorporates  a spurious sequence (i.e., a false positive result). Which of the following choices is the best way to reduce corruption?

A) Lower the E value

B) Remove the filtering

C) Use a Shorter query

D) Run fewer iterations

Definition
Lower the E value
Term

What is the main advantage of employing reverse position specific BLAST?

A) Reversing a query and/or asset of database sequences provides a set of null alignments from which the statistical significance of a PSI-BLAST search can be estimated

B)This method precomputers a large collection of position specific matrices, allowing a query to be rapidly assigned to a protein family

C) This methods allows critical concerved residues in the query sequence to be identified

D) This method facilitates the comparison of multiple PSSMs

Definition
This method precomputers a large collection of position specific matrices, allowing a query to be rapidly assigned to a protein family
Term

What capability does a profile hidden Markov Model offer that PSI-BLAST does not offer for protein queries?

A) A profile HMM can model the likelihood of insertion and deletions in aligned residues

B) A profile HMM can identify distantly related homologs that are not identified by standard BLAST searches

C) A profile HMM can estimate the probability of achieving particular scores for aligned residues across the length of a multiple sequence alignment

D) A profile HMM can model both protein relationships that are neither conserved or distant

Definition
A profile HMM can model the likelihood of insertion and deletions in aligned residues
Term

Which of the following BLAST algorithms utilizes the "CD" database for performing similarity searches?

A) PSI-BLAST

B) PHI-BLAST

C) Delta-BLAST

D) All of the above

Definition
Delta-BLAST
Term

In probability theory, probability associated with a given single event is called as:

A) Marginal prob

B) Joint prob

C) Conditional prob

D) Bayesian prob

Definition
Marginal prob
Term

Which of the following databases are based on profile HMMs:

A) PFAM

B) SMART

C) INTERPRO

D) All of the above

Definition
All of the above
Supporting users have an ad free experience!