微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

perl – 如何解析文件,创建记录并对记录执行操作,包括术语频率和距离计算

我是一个介绍Perl课程的学生,正在寻找建议和反馈我的方法来编写一个小的(但棘手的)程序来分析有关原子的数据.我的教授鼓励论坛.我没有使用Perl subs或模块(包括Bioperl),因此请将响应限制在适当的“初学者级别”,以便我能够理解并从您的建议和/或代码中学习(也请限制“魔术”).

该计划的要求如下:

  1. Read a file (containing data about Atoms) from the command line & create an array of atom records (one record/atom per newline). For each record the program will need to store:

    • The atom’s serial number (cols 7 – 11)
    • The three-letter name of the amino acid to which it belongs (cols 18 – 20)
    • The atom’s three coordinates (x,y,z) (cols 31 – 54 )
    • The atom’s one- or two-letter element name (e.g. C,O,N,Na) (cols 77-78 )

    >提示三个命令之一:频率,长度,密度d(d是某个数字):

    • freq – how many of each type of atom is in the file (example nitrogen,sodium,etc would be displayed like this: N: 918 S: 23
    • length – The distances among coordinates
    • density d (where d is a number) – program will prompt for the name of a file to save computations to and will containing the distance between that atom and every other atom. If that distance is less than or equal to the number d,it increments the count of the number of atoms that are within that distance,unless that count is zero into the file. The output will look something like:
    1: 5
    2: 3
    3: 6
    … (very big file) and will close when it finishes.

    我正在寻找下面代码中我写的(并且需要写)的反馈.我特别感谢有关如何编写我的潜艇的任何反馈.我在底部包含了示例输入数据.

    我看到的程序结构和功能描述

    $^W = 1; # turn on warnings
    use strict; # behave!
    
    my @fields;
    my @recs;
    
    while ( <DATA> ) {
     chomp;
     @fields = split(/\s+/);
     push @recs,makeRecord(@fields);
    }
    
    for (my $i = 0; $i < @recs; $i++) {
     printRec( $recs[$i] );
    }
        my %command_table = (
     freq => \&freq,length => \&length,density => \&density,help => \&help,quit => \&quit
     );
    
    print "Enter a command: ";
    while ( <STDIN> ) {
     chomp; 
     my @line = split( /\s+/);
     my $command = shift @line;
     if ($command !~ /^freq$|^density$|length|^help$|^quit$/ ) {
        print "Command must be: freq,length,density or quit\n";
        }
      else {
        $command_table{$command}->();
        }
     print "Enter a command: ";
     }
    
    sub makeRecord 
        # Read the entire line and make records from the lines that contain the 
        # word ATOM or HETATM in the first column. Not sure how to do this:
    {
     my %record = 
     (
     serialnumber => shift,aminoacid => shift,coordinates => shift,element  => [ @_ ]
     );
     return\%record;
    }
    
    sub freq
        # take an array of atom records,return a hash whose keys are 
        # distinct atom names and whose values are the frequences of
        # these atoms in the array.  
    
    sub length
        # take an array of atom records and return the max distance 
        # between all pairs of atoms in that array. My instructor
        # advised this would be constructed as a for loop inside a for loop. 
    
    sub density
        # take an array of atom records and a number d and will return a
        # hash whose keys are atom serial numbers and whose values are 
        # the number of atoms within that distance from the atom with that
        # serial number. 
    
    sub help
    {
        print "To use this program,type either\n","freq\n","length\n","density followed by a number,d,\n","help\n","quit\n";
    }
    
    sub quit
    {
     exit 0;
    }
    
    # truncating for testing purposes. Actual data is aprox. 100 columns 
    # and starts with ATOM or HETATM.
    __DATA__
    ATOM   4743  CG  GLN A 704      19.896  32.017  54.717  1.00 66.44           C  
    ATOM   4744  CD  GLN A 704      19.589  30.757  55.525  1.00 73.28           C  
    ATOM   4745  OE1 GLN A 704      18.801  29.892  55.098  1.00 75.91           O

解决方法

看起来你的Perl技能正在很好地推进 – 使用引用和复杂的数据结构.以下是一些一般建议的提示和部分.

>使用警告而不是$^ W = 1启用警告.前者是自我记录的,并且具有封闭块的本地优势而不是全局设置.
>使用命名良好的变量,这将有助于记录程序的行为,而不是依赖于Perl的特殊$_.例如:

while (my $input_record = <DATA>){
}

>在用户输入场景中,无限循环提供了一种避免重复指令的方法,例如“输入命令”.见下文.
>您的正则表达式可以简化,以避免重复锚点的需要.见下文.
>作为一般规则,肯定性测试比否定测试更容易理解.请参阅下面修改后的if-else结构.
>将程序的每个部分都包含在自己的子程序中.由于一系列原因,这是一个很好的通用做法,所以我只是开始习惯.
>一个相关的良好做法是尽量减少全局变量的使用.作为练习,您可以尝试编写程序,使其根本不使用全局变量.相反,任何所需的信息都将在子例程之间传递.对于小程序,人们不一定需要对避免使用全局变量来保持僵硬,但是记住理想并不是一个坏主意.
>为您的长度子例程指定一个不同的名称.该名称已被内置长度函数使用.
>关于makeRecord的问题,一种方法是忽略makeRecord中的过滤问题.相反,makeRecord可以包含一个额外的哈希字段,过滤逻辑将驻留在其他地方.例如:

my $record = makeRecord(@fields);
push @recs,$record if $record->{type} =~ /^(ATOM|HETATM)$/;

以上一些要点的说明:

use strict;
use warnings;

run();

sub run {
    my $atom_data = load_atom_data();
    print_records($atom_data);
    interact_with_user($atom_data);
}

...

sub interact_with_user {
    my $atom_data = shift;
    my %command_table = (...);

    while (1){
        print "Enter a command: ";
        chomp(my $reply = <STDIN>);

        my ($command,@line) = split /\s+/,$reply;

        if ( $command =~ /^(freq|density|length|help|quit)$/ ) {
            # Run the command.
        }
        else {
            # Print usage message for user.
        }
    }
}

...

原文地址:https://www.jb51.cc/Perl/241643.html

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐