索引库的维护

# 40.索引库的维护

索引库并不是一成不变的，需要维护，例如增删改查　　‍

# 索引库的添加

‍

有时候我们的原始文档增加了，此时就需要增加索引。过程和索引库的创建是一样的，只不过创建索引库是添加一堆文档。

‍

我们新建一个类：IndexManager，添加一个测试方法：

public class IndexManager {
    @Test
    public void addDocument() throws Exception {
        // 创建一个IndexWriter对象，需要使用IKAnalyzer作为分析器
        Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
        IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(new IKAnalyzer()));

        // 创建一个Document对象
        Document document = new Document();

        // 向document对象中添加域，不同的document可以有不同的域，同一个document可以有相同的域。
        document.add(new TextField("fileName", "新添加的文档", Field.Store.YES));
        document.add(new TextField("fileContent", "新添加的文档的内容", Field.Store.NO));
        document.add(new StoredField("filePath", "d:/temp/1.txt"));

        // 把文档对象写入索引库
        indexWriter.addDocument(document);

        // 关闭IndexWriter对象
        indexWriter.close();
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

‍

然后我们可以在Luke中刷新下：

‍

并且可以看到第14个文档的Field，注意内容是空的：

‍

虽然内容是空的，但是搜索还是能搜到的：

‍

# 索引库删除

‍

# 删除全部

//删除全部索引
@Test
public void deleteAllDocument() throws Exception {
    IndexWriter indexWriter = new IndexWriter(FSDirectory.open(new File("D:\\temp\\index").toPath()), new IndexWriterConfig(new IKAnalyzer()));
    indexWriter.deleteAll();
    indexWriter.close();
}

1
2
3
4
5
6
7

说明：将索引目录的索引信息全部删除，直接彻底删除，无法恢复。 此方法慎用！！

‍

删除后，虽然index文件夹中还有几个文件，但是使用Luke可以看到是空的了：

‍

# 指定查询条件删除

除了删除全部，还可以指定条件。我们先重新创建索引

演示指定条件删除：例如删除文件名中包含 apache 的文档。

我们可以用searchIndex方法来查询符合条件的文档有多少个：

public void searchIndex() throws Exception {
        // 省略其他代码
        // 4. 第四步：创建一个TermQuery对象，指定查询的域和查询的关键词。
        Query query = new TermQuery(new Term("name", "apache"));
        // 省略其他代码
}

1
2
3
4
5
6

‍

运行结果：

查询出来的总记录数：2
name: apache lucene.txt
path: D:\temp\searchsource\apache lucene.txt
size: 725
-------------分割线-----------------
name: Welcome to the Apache Solr project.txt
path: D:\temp\searchsource\Welcome to the Apache Solr project.txt
size: 5465
-------------分割线-----------------

1
2
3
4
5
6
7
8
9

‍

或使用luke看到是包含2个的

‍

然后我们写方法删除：

//根据查询条件删除索引
@Test
public void deleteDocumentByQuery() throws Exception {
    IndexWriter indexWriter = new IndexWriter(FSDirectory.open(new File("D:\\temp\\index").toPath()), new IndexWriterConfig(new IKAnalyzer()));

    // 删除文件名中包含apache的文档
    indexWriter.deleteDocuments(new Term("name", "apache"));
    indexWriter.close();
}

1
2
3
4
5
6
7
8
9

‍

刷新下Luke，可以看到删除前后，document的数量是不一样的。

‍

# 索引库的修改

假设我们要修改name中包含 spring 的文档。老规矩，先重建索引（这里我们使用IKAnalyzer的方法来重建）

‍

我们可以用searchIndex方法来查询符合条件的文档有多少个：

public void searchIndex() throws Exception {
        // 省略其他代码
        // 4. 第四步：创建一个TermQuery对象，指定查询的域和查询的关键词。
        Query query = new TermQuery(new Term("name", "spring"));
        // 省略其他代码
}

1
2
3
4
5
6

‍

运行结果：

查询出来的总记录数：2
name: spring.txt
path: D:\temp\searchsource\spring.txt
size: 82
-------------分割线-----------------
name: spring_README.txt
path: D:\temp\searchsource\spring_README.txt
size: 3257
-------------分割线-----------------

1
2
3
4
5
6
7
8
9

‍

然后写方法更新。更新的原理：先删除后添加

@Test
public void updateDocument() throws Exception {
    IndexWriter indexWriter = new IndexWriter(FSDirectory.open(new File("D:\\temp\\index").toPath()), new IndexWriterConfig(new IKAnalyzer()));

    Document document = new Document();
    document.add(new TextField("name", "更新之后的文档", Field.Store.YES));
    document.add(new TextField("name1", "更新之后的文档1", Field.Store.YES));
    document.add(new TextField("name2", "更新之后的文档2", Field.Store.YES));
    indexWriter.updateDocument(new Term("name", "spring"), document);

    indexWriter.close();
}

1
2
3
4
5
6
7
8
9
10
11
12

‍

运行结果：先删除2个文档，然后添加一个，因此是13个文档

‍

并且此时查询名称带spring的文档，就没有了

‍

# 源码

已将源码上传到Gitee (opens new window)或GitHub (opens new window)上。并且创建了分支demo4，读者可以通过切换分支来查看本文的示例代码

在 GitHub 上编辑此页

上次更新: 2024/1/23 08:20:41

← 常见的Field Lucene索引库查询→