{"id":251,"date":"2020-08-11T21:08:35","date_gmt":"2020-08-11T21:08:35","guid":{"rendered":"https:\/\/molecularsciences.org\/content\/?p=251"},"modified":"2024-02-08T08:18:09","modified_gmt":"2024-02-08T13:18:09","slug":"quick-introduction-to-bioperl","status":"publish","type":"post","link":"https:\/\/molecularsciences.org\/content\/quick-introduction-to-bioperl\/","title":{"rendered":"Quick Introduction to BioPerl"},"content":{"rendered":"\n<p>BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics. It is open source and widely used in the bioinformatics community. Bioperl provides software modules for many of the typical tasks of bioinformatics programming. These include:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Accessing sequence data from local and remote databases<\/li><li>Transforming formats of database\/ file records<\/li><li>Manipulating individual sequences<\/li><li>Searching for similar sequences<\/li><li>Creating and manipulating sequence alignments<\/li><li>Searching for genes and other structures on genomic DNA<\/li><li>Developing machine readable sequence annotations<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Retrieving sequences from Swiss-Prot<\/h3>\n\n\n\n<p>The following script retrieves a sequence from Swiss-Prot:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/perl\nuse strict;\nuse Bio::Perl;\n\n# it retrieves sequences from swissprot and generates fasta output\n# this script will not work if you are not connected to the Internet\n\nmy $s = get_sequence('swiss',$ARGV&#91;0]);\nwrite_sequence(\"&gt;$ARGV&#91;1]\",'fasta',$s);<\/code><\/pre>\n\n\n\n<p>usage:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ perl get_swissprot_sequence.pl P11217 a.fasta\n$ perl get_swissprot_sequence.pl PYGM_HUMAN b.fasta<\/code><\/pre>\n\n\n\n<p>Perl commandline parameters are stored in the @ARGV array. The first argument $ARGV[0] stores the Swiss-Prot accession number or identifier. The second argument $ARGV[1] defines the output file.<\/p>\n\n\n\n<p>For more on use SeqIO with files, please refer to&nbsp;<a target=\"_blank\" href=\"http:\/\/www.bioperl.org\/wiki\/Sequence_formats\" rel=\"noreferrer noopener\">http:\/\/www.bioperl.org\/wiki\/Sequence_formats<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Blasting a sequence<\/h3>\n\n\n\n<p>Aliging sequences using BLAST is the most common task performed in bioinformatics. Here is how you can do it using BioPerl:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/perl \nuse strict;\nuse warnings;\nuse Bio::Perl;\n\n# get sequence given an identifier or accession number\nmy $so = get_sequence('swiss',$ARGV&#91;0]);\n\n# get blast the sequence\nmy $blast_result = blast_sequence($so);\n\n# write blast results into a file\nwrite_blast(\"&gt;$ARGV&#91;1]\",$blast_result);<\/code><\/pre>\n\n\n\n<p>usage:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ perl blastseq.pl PYGM_HUMAN pygm.blast\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Creating sequence objects<\/h3>\n\n\n\n<p>When working with fasta or other files, you have to first create sequence objects.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/perl\nuse strict;\nuse warnings;\nuse Bio::Perl;\nuse Bio::SeqIO;\n\n# create sequence object\nmy $s = Bio::SeqIO-&gt;new( -file =&gt; \"pygm.fasta\", -format =&gt; \"fasta\");\nmy $st = $s-&gt;next_seq;\nprint $st-&gt;seq;<\/code><\/pre>\n\n\n\n<p>usage:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ perl create_sequence_object.pl<\/code><\/pre>\n\n\n\n<p>It takes a file named pygm.fasta as input and creates a sequence object. The last two lines are for printing the sequence.<\/p>\n\n\n\n<p>If you have multiple fasta sequences in a file, SeqIO would create multiple sequence object for you automatically. To print all the sequences, you can use the while loop in the last line.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/perl\nuse strict;\nuse warnings;\nuse Bio::Perl;\nuse Bio::SeqIO;\n\n# create sequence objects\nmy $s = Bio::SeqIO-&gt;new( -file =&gt; $ARGV&#91;0], -format =&gt; \"fasta\");\n\n# print sequences\nwhile (my $st = $s-&gt;next_seq) { print $st-&gt;seq; print \"\\n\"; }<\/code><\/pre>\n\n\n\n<p>usage:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ perl create_sequence_objects.pl pygm.fasta<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Retrieving sequences from Genbank<\/h3>\n\n\n\n<p>Following code retrieves a Genbank sequence and creates an object.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/perl\nuse strict;\nuse warnings;\nuse Bio::DB::GenBank;\nuse Data::Dumper;\n\n# create a GenBank object\nmy $a = Bio::DB::GenBank-&gt;new;   \nmy $b = $a-&gt;get_Seq_by_acc($ARGV&#91;0]);\n\n# Dump Data\nprint Dumper($b);<\/code><\/pre>\n\n\n\n<p>usage:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>$ perl retrieve_genbank_sequence.pl EW695397<\/code><\/pre>\n\n\n\n<p>Dumper prints the contents of an object.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Installing ClustalW on Linux<\/h3>\n\n\n\n<p>ClustalW can be downloaded from ftp:\/\/ftp.ebi.ac.uk\/pub\/software\/clustalw2\/. Choose the src version, something like clustalw-2.0.10-src.tar.gz.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>wget ftp:\/\/ftp.ebi.ac.uk\/pub\/software\/clustalw2\/2.0.10\/clustalw-2.0.10-src.tar.gz<\/li><li>tar xzvf clustalw-2.0.10-src.tar.gz<\/li><li>cd clustalw-2.0.10<\/li><li>.\/configure<\/li><li>make<\/li><li>su<\/li><li>make install<\/li><li>clustalw2<\/li><\/ol>\n\n\n\n<p>The last line is to test whether clustalw is properly installed and running.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Perl CGI not running<\/h3>\n\n\n\n<p>If your Perl CGI is not running, look at the ScriptAlias settings in your httpd.conf. It defines which directories are allowed to run CGI scripts.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">ClustalW with BioPerl<\/h3>\n\n\n\n<p>BioPerl documentation on ClustalW is great but I faced some problems as a beginner. The following is a code from ClustalW docs modified to make life easier for the beginner.<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Make sure bioperl-run in installed in addition to BioPerl.<\/li><li>Make sure clustalw is installed at executable<\/li><li>Set path using the following command (assuming that clustalw is installed at \/usr\/local\/bin\/clustalw2):export CLUSTALDIR=\/usr\/local\/bin\/clustalw2<\/li><\/ol>\n\n\n\n<p>code<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/perl\nuse Bio::AlignIO;\nuse Bio::Root::IO;\nuse Bio::Seq;\nuse Bio::SeqIO;\nuse Bio::SimpleAlign;\nuse Bio::TreeIO;\n\nBEGIN { $ENV{CLUSTALDIR} = '\/usr\/local\/bin\/clustalw2\/' }\nuse Bio::Tools::Run::Alignment::Clustalw;\n\n# Build a clustalw alignment factory\n@params = ('ktuple' =&gt; 2, 'matrix' =&gt; 'BLOSUM');\n$factory = Bio::Tools::Run::Alignment::Clustalw-&gt;new(@params);\n\n# Pass the factory a list of sequences to be aligned.\n$inputfilename = 'blastdump\/input.fasta';\n$aln = $factory-&gt;align($inputfilename); # $aln is a SimpleAlign object.\n# or\n$seq_array_ref = \\@seq_array;\n# where @seq_array is an array of Bio::Seq objects\n$aln = $factory-&gt;align($seq_array_ref);\n\n# Or one can pass the factory a pair of (sub)alignments\n#to be aligned against each other, e.g.:\n$aln = $factory-&gt;profile_align($aln1,$aln2);\n# where $aln1 and $aln2 are Bio::SimpleAlign objects.\n\n# Or one can pass the factory an alignment and one or more unaligned\n# sequences to be added to the alignment. For example:\n$aln = $factory-&gt;profile_align($aln1,$seq); # $seq is a Bio::Seq object.\n\n# Get a tree of the sequences\n$tree = $factory-&gt;tree(\\@seq_array);\n\n# Get both an alignment and a tree\n($aln, $tree) = $factory-&gt;run(\\@seq_array);\n\n# Do a footprinting analysis on the supplied sequences, getting back the\n# most conserved sub-alignments\nmy @results = $factory-&gt;footprint(\\@seq_array);\nforeach my $result (@results) {\n  print $result-&gt;consensus_string, \"\\n\";\n}\n\n# There are various additional options and input formats available.\n# See the DESCRIPTION section that follows for additional details.<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics. It is open source and widely used in the bioinformatics community. Bioperl provides software modules for many of the typical tasks of bioinformatics programming. These include: Accessing sequence data from local and remote databases Transforming formats of database\/ file [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[512],"tags":[96,79,256],"class_list":["post-251","post","type-post","status-publish","format-standard","hentry","category-misc","tag-bioperl","tag-perl","tag-quickstart"],"_links":{"self":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/251","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/comments?post=251"}],"version-history":[{"count":2,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/251\/revisions"}],"predecessor-version":[{"id":739,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/posts\/251\/revisions\/739"}],"wp:attachment":[{"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/media?parent=251"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/categories?post=251"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/molecularsciences.org\/content\/wp-json\/wp\/v2\/tags?post=251"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}