![BioPerl logo](images/bioperl/BioPerl-1.png "BioPerl logo")
## Principe
- Collection de module Perl
- Objectif: Faciliter le developpement de scripts Perl pour des
applications bioinformatiques
- Open-source via une organisation [GitHub](https://github.com/bioperl)
- Soutenu par Open Bioinformatics Foundation
## Histoire
- 1996 : Début
- 2002
- Premier Open Bio Hackathon
- BioPerl 1.0
- [Article](http://genome.cshlp.org/content/12/10/1611.abstract)
## Actuellement
- [GitHub](https://github.com/bioperl)
- 31 contributeurs
- Dernière release : 1.6.924 en Juillet 2014
- Orienté-objet
- \> 40 Modules Perl
## Comparaison avec les autres Bio Toolkits
## Bio Toolkits
| Release 1.0 | Dernière release | Article majeur | Citations
--- | --- | --- | --- | ---
[BioPerl](http://www.bioperl.org/) | 2002 | 07/2014 | [2002](http://genome.cshlp.org/content/12/10/1611.abstract) | 1 306
[BioPython](http://biopython.org/) | 2000 | 10/2015 | [2009](http://bioinformatics.oxfordjournals.org/content/25/11/1422.short) | 608
[BioJava](http://biojava.org/) | 2008 | 07/2015 | [2008](http://bioinformatics.oxfordjournals.org/content/24/18/2096.short) | 201
[BioRuby](http://www.bioruby.org/) | 2006 | 07/2015 | - | -
[BioPHP](http://www.biophp.org/) | 2003 | ? | - | -
[BioJS](http://biojs.net/) | 2013 | 09/2014 | [2013](http://bioinformatics.oxfordjournals.org/content/early/2013/02/23/bioinformatics.btt100.short) | 44
[Bioconductor](https://www.bioconductor.org/) | 2001 | 10/2015 | |
![](images/bioperl/google_scholar_results.svg "")
### Installation sous Linux/Mac OS
```
$ (sudo) cpan -i CPAN
$ cpan
cpan[1]> d /BioPerl/
Reading '/Users/cidam/.cpan/Metadata'
Database was generated on Thu, 14 Jan 2016 13:53:43 GMT
Distribution BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution CDRAUG/Dist-Zilla-PluginBundle-BioPerl-0.20.tar.gz
Distribution CJFIELDS/BioPerl-1.6.901.tar.gz
Distribution CJFIELDS/BioPerl-1.6.923.tar.gz
Distribution CJFIELDS/BioPerl-1.6.924.tar.gz
...
11 items found
cpan[2]> install CJFIELDS/BioPerl-1.6.924.tar.gz
```
# Manipulation de séquences
## Représentation d'une séquence
### 3 types d'objets pour une séquence
- `Bio::PrimarySeq`
- Séquence + nom
- Fichier fasta
### 3 types d'objets pour une séquence
- `Bio::SeqFeatureI`
- Caractéristique sur une séquence (séquence, localisation et annotation)
- Entrée simple d'une table de caractéristique EMBL/GenBank/DDBJ
### 3 types d'objets pour une séquence
- `Bio::Seq`
- 1 séquence et une collection de caractéristiques
- Entrée simple d'une table EMBL/GenBank/DDBJ
### Classe `Bio::Seq`
```
$ perldoc Bio::Seq
NAME
Bio::Seq - Sequence object, with features
...
DESCRIPTION
A Seq object is a sequence with sequence features placed on it.
The Seq object contains a PrimarySeq object for the actual sequence
and also implements its interface.
...
```
Note:
Montrer dans un terminal
### Créer d'un objet `Bio::Seq`
```
use Bio::Seq;
my $seqobj = Bio::Seq->new(
-seq => "ACTGTGTGTCC",
-id => "Chlorella sorokiniana",
-accession_number => "CAA41635"
);
```
Note:
Notebook
### Méthodes (1)
Méthodes renvoyant des chaines de caractères et acceptant
parfois des chaine de caractères pour modifier des
propriétés
```
$seqobj->seq(); # string of sequence
$seqobj->subseq(5,10); # part of the sequence as a string
$seqobj->accession_number(); # when there, the accession number
$seqobj->alphabet(); # one of 'dna','rna',or 'protein'
$seqobj->version() # when there, the version
$seqobj->length() # length
$seqobj->desc(); # description
$seqobj->primary_id(); # a unique id for this sequence regardless
# of its display_id or accession number
$seqobj->display_id(); # the human readable id of the sequence
```
Note:
Notebook
### Méthodes (2)
Méthodes renvoyant des nouveaux objet `Bio::Seq`
```
$seqobj->trunc(5,10) # truncation from 5 to 10 as new object
$seqobj->revcom # reverse complements sequence
$seqobj->translate # translation of the sequence
```
Note:
Notebook
### Méthodes (3)
Méthode pour déterminer si une chaine de caractère peut être
accepter par la méthode `seq()`
```
$seqobj->validate_seq($string)
```
Note:
Notebook
### Manipulation de séquences
```
$seq = $seqobj->seq();
$length = $seqobj->length();
$subseq = $seqobj->subseq($length/2, $length);
$new_seq = $seq.$subseq;
if($seqobj->validate_seq($new_seq)){
$seqobj->seq($new_seq);
}
print $seqobj->seq()," ",$seqobj->length()\n";
```
Que sera affiché?
Note:
Notebook
### Traduction
```
$translated_obj = $seqobj;
if( $seqobj->alphabet() == 'dna'){
$translated_obj = $seqobj->translate();
}
print $translated_obj->seq(),"\n";
```
Que sera affiché?
Note:
Notebook
## Récupération de statistiques sur une séquence
### Classe `Bio::Tools::SeqStats`
```
$ perldoc Bio::Tools::SeqStats
NAME
Bio::Tools::SeqStats - Object holding statistics
for one particular sequence
...
DESCRIPTION
Bio::Tools::SeqStats is a lightweight object for the calculation of
simple statistical and numerical properties of a sequence. By
"lightweight" I mean that only "primary" sequences are handled by the
object. The calling script needs to create the appropriate primary
sequence to be passed to SeqStats if statistics on a sequence feature
are required. Similarly if a codon count is desired for a frame-
shifted sequence and/or a negative strand sequence, the calling script
needs to create that sequence and pass it to the SeqStats object.
...
```
Note:
Montrer dans un terminal
### Création
```
$seq_stats = Bio::Tools::SeqStats->new(-seq => $seqobj);
```
Note:
Notebook
### Méthodes
- `count_monomers`
- Comptage du nombre de chaque type de monomère
- `get_mol_wt`
- Calcul du poids moléculaire
- `count_codons`
- Comptage du nombre de chaque type de codons
- `hydropathicity`
- Calcul l'hydrophaticité moyenne de Kyte-Doolittle
Note:
Notebook
# Manipulation de fichiers de séquences
### Classe `Bio::SeqIO`
```
$ perldoc Bio::SeqIO
NAME
Bio::SeqIO - Handler for SeqIO Formats
...
DESCRIPTION
Bio::SeqIO is a handler module for the formats in the SeqIO set (eg,
Bio::SeqIO::fasta). It is the officially sanctioned way of getting at
the format objects, which most people should use.
The Bio::SeqIO system can be thought of like biological file handles.
They are attached to filehandles with smart formatting rules (eg,
genbank format, or EMBL format, or binary trace file format) and can
either read or write sequence objects (Bio::Seq objects, or more
correctly, Bio::SeqI implementing objects, of which Bio::Seq is one
...
```
Note:
Montrer dans un terminal
### Création d'un objet `Bio::SeqIO`
Ouverture d'un flux sur le fichier ou la chaine de caractères
### Constructeur
Paramètres possibles
- `-file`
- `-string`
- `-format` : `fasta`, `nexus`, `fastq`, `quality`, `excel`, `raw`, `tab`, ...
- `-alphabet` : `dna`, `rna` ou `protein`
### Méthodes
- `next_seq`
- Lecture du prochain objet "séquence" dans le flux
- Renvoi d'un objet `Bio::Seq` ou rien si aucune séquence
disponible
- `write_seq`
- Ecriture d'un object `Bio:Seq` dans le flux
- `format`, `alphabet`, ...
### Ecrire de séquences dans un fichier
```
use Bio::SeqIO;
use Bio::Seq;
my $seqio_obj = Bio::SeqIO->new(-file => '>sequence.fasta',
-format => 'fasta' );
my $seqobj = Bio::Seq->new(
-seq => "ACTGTGTGTCC",
-id => "Chlorella sorokiniana"
);
$seqio_obj->write_seq($seqobj);
my $seqobj = Bio::Seq->new(
-seq => "ACTGTGTGTCCTGTGTCC",
-id => "Modified Chlorella sorokiniana"
);
$seqio_obj->write_seq($seqobj);
```
Que fait ce code?
Note:
Notebook
Montrer aussi le contenu du fichier de sortie
### Lecture des séquences d'un fichier
```
use Bio::SeqIO;
$seqio_obj = Bio::SeqIO->new(-file => "sequence.fasta",
-format => "fasta" );
while ($seq_obj = $seqio_obj->next_seq){
print $seq_obj->seq,"\n";
}
```
Que fait ce code?
Note:
Notebook
# Accès aux bases de données
## Récupération d'une séquence dans une base de données
### Bases de données accessibles
Base de données | Module
--- | ---
[GenBank](http://www.ncbi.nlm.nih.gov/genbank/) | `Bio::DB::GenBank`
[SwissProt](http://web.expasy.org/docs/swiss-prot_guideline.html) | `Bio::DB::SwissProt`
[GenPept](http://www.ncbi.nlm.nih.gov/protein/) | `Bio::DB::GenPept`
[EMBL](http://www.ebi.ac.uk/ena) | `Bio::DB::EMBL`
SeqHound | `Bio::DB::SeqHound`
[Entrez Gene](http://www.ncbi.nlm.nih.gov/gene) | `Bio::DB::EntrezGene`
[RefSeq](http://www.ncbi.nlm.nih.gov/refseq/) | `Bio::DB::RefSeq`
### Classe `Bio::DB::GenBank`
```
$ perldoc Bio::DB::GenBank
NAME
Bio::DB::GenBank - Database object interface to GenBank
...
DESCRIPTION
Allows the dynamic retrieval of Bio::Seq sequence objects from the
GenBank database at NCBI, via an Entrez query
...
```
Note:
Montrer dans un terminal
### Constructeur
```
use Bio::DB::GenBank;
$db_obj = Bio::DB::GenBank->new;
```
Note:
Notebook
### Récupération d'une séquence dans une base de données
```
use Bio::DB::GenBank;
use Bio::Seq;
$db_obj = Bio::DB::GenBank->new;
$seq_obj = $db_obj->get_Seq_by_id(2);
print $seq_obj->display_id(),"\n";
```
Note:
Notebook
## Récupération de plusieurs séquences avec des requêtes plus complexes
### Bases de données et modules pour les requêtes
Base de données | Module
--- | ---
[GenBank](http://www.ncbi.nlm.nih.gov/genbank/) | `Bio::DB::Query::GenBank`
[SwissProt](http://web.expasy.org/docs/swiss-prot_guideline.html) | `Bio::DB::Query::SwissProt`
[GenPept](http://www.ncbi.nlm.nih.gov/protein/) | `Bio::DB::Query::GenPept`
[EMBL](http://www.ebi.ac.uk/ena) | `Bio::DB::Query::EMBL`
SeqHound | `Bio::DB::Query::SeqHound`
[Entrez Gene](http://www.ncbi.nlm.nih.gov/gene) | `Bio::DB::Query::EntrezGene`
[RefSeq](http://www.ncbi.nlm.nih.gov/refseq/) | `Bio::DB::Query::RefSeq`
### Classe `Bio::DB::Query::GenBank`
```
$ perldoc Bio::DB::Query::GenBank
NAME
Bio::DB::Query::GenBank - Build a GenBank Entrez Query
...
DESCRIPTION
This class encapsulates NCBI Entrez queries. It can be used to
store a list of GI numbers, to translate an Entrez query expression
into a list of GI numbers, or to count the number of terms that
would be returned by a query. Once created, the query object can
be passed to a Bio::DB::GenBank object in order to retrieve the
entries corresponding to the query.
...
```
Note:
Montrer dans un terminal
### Création d'un objet `Bio::DB::Query::GenBank`
Ouverture d'un flux sur des objets `Bio::Seq`
### Constructeur
Paramètres possibles
- `-db` : `protein`, `nucleotide`, ...
- `-query`
- `-mindate`
- `-maxdate`
- `-reldate`
- `-datetype`
- `-ids`
- `-maxids`
### Méthodes
- `count`
- Renvoi du nombre de résultats de la requête
- `ids`
- Renvoi/Modifie la liste des identifiants des résultats
### Récupération de plusieurs séquences
```
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
$query = "Arabidopsis[ORGN] AND topoisomerase[TITL] and 0:3000[SLEN]";
$query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide',
-query => $query );
$gb_obj = Bio::DB::GenBank->new;
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
while ($seq_obj = $stream_obj->next_seq) {
print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}
```
Note:
Notebook
# Parser des rapports de recherche
### Classe `Bio::SearchIO`
```
$ perldoc Bio::SearchIO
NAME
Bio::SearchIO - Driver for parsing Sequence Database Searches (BLAST,
FASTA, ...)
...
DESCRIPTION
This is a driver for instantiating a parser for report files from
sequence database searches. This object serves as a wrapper for the
format parsers in Bio::SearchIO::* - you should not need to ever use
those format parsers directly. (For people used to the SeqIO system it,
we are deliberately using the same pattern).
...
```
Note:
Notebook
### Création d'un objet `Bio::Search`
Ouverture d'un flux sur le fichier contenant un rapport
de recherche
### Constructeur
Paramètres possibles
- `-file`
- `-format`
- `-output_format`
- `-inclusion_threshold`
- `-signif`
- `-check_all_hits`
- `-min_query_len`
- `-best`
### Formats
Name | Format
--- | ---
`blast` | BLAST (WUBLAST, NCBIBLAST,bl2seq)
`fasta` | FASTA `-m9` and `-m0`
`blasttable` | BLAST `-m9` or `-m8` output (both NCBI and WUBLAST tabular)
`megablast` | MEGABLAST
`blastxml` | NCBI BLAST XML
... | ...
### Méthodes
- `next_result`
- `write_result`
- `write_report`
- `result_count`
- `best_hit_only`
- `check_all_hits`
### Représentation des données dans `Bio::Search`
- `Bio::Search`
- `Bio::Search::Result`
- `Bio::Search::Hit`
- `Bio::Search::HSP` (high-scoring segment pair)
### Méthodes de `Bio::Search::Result`
- `algorithm`
- `query_name`
- `query_accession`
- `query_length`
- `query_description`
- `database_name`
- `available_statistics`
- `available_parameters`
- `num_hits`
- `hits`
### Méthodes de `Bio::Search::Hit`
- `name`
- `length`
- `accession`
- `description`
- `algorithm`
- `raw_score`
- `significance`
- `hsps`
- `num_hsps`
- `locus`
- `accession_number`
### Méthodes de `Bio::Search::HSP` (1)
- `algorithm`
- `evalue`
- `expect`
- `frac_identical`
- `frac_conserved`
- `gaps`
- `query_string`
- `hit_string`
- `length('total'/'hit'/'query')`
- `num_conserved`
- `num_identical`
### Méthodes de `Bio::Search::HSP` (1)
- `rank`
- `seq_inds('hit'/'query', 'identical'/ 'conserved'/ 'conserved-notidentical')`
- `score`
- `range('hit'/'query')`
- `percent_identity`
- `strand('hit'/'query')`
- `start('hit'/'query')`
- `end('hit'/'query')`
- `matches('hit'/'query')`
- `get_aln`
### Parcours d'un fichier issu d'une requête Blast
```
use Bio::SearchIO;
my $in = new Bio::SearchIO(
-format => "blast",
-file => "report.bls");
while(my $result = $in->next_result){
while(my $hit = $result->next_hit){
while(my $hsp = $hit->next_hsp){
print "Query=", $result->query_name,
" Hit=", $hit->name,
" Length=", $hsp->length('total'),
" Percent_id=", $hsp->percent_identity,
"\n";
}
}
}
```
## Références
- [BioPerl GitHub Page](http://bioperl.github.io/index.html)
- [Wiki BioPerl](http://www.bioperl.org/wiki/)