微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

Java中小型静态数据集的内存中自动完成实现

如何解决Java中小型静态数据集的内存中自动完成实现

我正在尝试为少量静态数据5K记录实现自动完成功能

我想到了使用前缀树来支持此类查询

class AutocompleteSearch {
    class Entry {
        String sentence;
        int times;
        
        Entry(String sentence,int times) {
            this.sentence = sentence;
            this.times = times;
        }
    }
    
    class TrieNode {
        TrieNode[] children;
        int times;
        
        TrieNode() {
            children = new TrieNode[27];
            times = 0;
        }
    }
    
    private TrieNode root;
    private TrieNode prevIoUs;
    private String query;

    public AutocompleteSystem(String[] sentences,int[] times) {
        root = new TrieNode();
        query = "";
        
        for (int i = 0; i < sentences.length; i++) {
            insert(sentences[i],times[i]);
        }
    }
    
    public List<String> input(char c) {
        List<String> result = new ArrayList<>();
        if (c == '#') {
            insert(query,1);
            prevIoUs = null;
            query = "";
            return result;
        }
        
        query += c;
        List<Entry> history = lookup(c);
        history.sort((a,b) -> {
            if (a.times == b.times) {
                return a.sentence.compareto(b.sentence);
            }
            return b.times - a.times;
        });
        for (int i = 0; i < Math.min(history.size(),3); i++) {
            result.add(history.get(i).sentence);
        }
        return result;
    }
    
    private void insert(String sentence,int times) {
        TrieNode current = root;
        for (char c : sentence.tochararray()) {
            int index = c == ' ' ? 26 : c - 'a';
            if (current.children[index] == null) {
                current.children[index] = new TrieNode();
            }
            current = current.children[index];
        }
        current.times += times;
    }
    
    private List<Entry> lookup(char c) {
        List<Entry> history = new ArrayList<>();
        if (prevIoUs == null && query.length() > 1) {
            return history;
        }
        
        TrieNode current = prevIoUs == null ? root : prevIoUs;
        int index = c == ' ' ? 26 : c - 'a';
        if (current.children[index] == null) {
            prevIoUs = null;
            return history;
        }
        
        prevIoUs = current.children[index];
        traverse(query,prevIoUs,history);
        return history;
    }
    
    private void traverse(String s,TrieNode node,List<Entry> history) {
        if (node.times > 0) {
            history.add(new Entry(s,node.times));
        }
        
        for (int i = 0; i < 27; i++) {
            if (node.children[i] != null) {
                String next = i == 26 ? s + ' ' : s + (char) ('a' + i);
                traverse(next,node.children[i],history);
            }
        }
    }
}

问题是如果我要在full_name上扩展自动完成的解决方案以支持其他字段,例如可以说我有记录为

Student
id:
full_name:
gender:
subject_enrolled:

,我不仅要自动完成搜索,还要对subject_enrolled或其他字段进行过滤。 我想到了使用倒排索引将每个字段映射到对应的学生ID列表,然后与自动完成但不如自动完成效率高的学生ID相交。

有人对数据结构,可以更精确地使用的库有任何建议吗?我正在寻找mongodb /弹性搜索的内存替代方案,例如查询支持

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。