第五章：字符串匹配

1. KMP算法

KMP算法（Knuth-Morris-Pratt）通过预处理模式串，构建部分匹配表（Partial Match Table），利用已匹配的信息跳过不必要的比较，从而提高匹配效率。

算法步骤：

构建部分匹配表（next数组）。
使用next数组进行匹配，当匹配失败时，根据next数组跳过部分字符。

代码实现：

public class KMP {
    // 构建部分匹配表（next数组）
    private int[] buildNext(String pattern) {
        int[] next = new int[pattern.length()];
        next[0] = -1; // 初始值
        int i = 0, j = -1;

        while (i < pattern.length() - 1) {
            if (j == -1 || pattern.charAt(i) == pattern.charAt(j)) {
                i++;
                j++;
                next[i] = j;
            } else {
                j = next[j];
            }
        }
        return next;
    }

    // KMP匹配算法
    public int kmpSearch(String text, String pattern) {
        int[] next = buildNext(pattern);
        int i = 0, j = 0;

        while (i < text.length() && j < pattern.length()) {
            if (j == -1 || text.charAt(i) == pattern.charAt(j)) {
                i++;
                j++;
            } else {
                j = next[j]; // 根据next数组跳过部分字符
            }
        }

        if (j == pattern.length()) {
            return i - j; // 返回匹配的起始位置
        }
        return -1; // 未找到匹配
    }

    public static void main(String[] args) {
        KMP kmp = new KMP();
        String text = "ABABDABACDABABCABAB";
        String pattern = "ABABCABAB";
        int index = kmp.kmpSearch(text, pattern);
        System.out.println("KMP Match at index: " + index);
    }
}

2. BM算法

BM算法（Boyer-Moore）是一种高效的字符串匹配算法，通过从右向左比较模式串，并结合坏字符规则和好后缀规则跳过大量不必要的比较。

算法步骤：

预处理模式串，生成坏字符表和好后缀表。
从右向左匹配模式串，根据坏字符规则和好后缀规则跳过字符。

代码实现：

public class BM {
    // 坏字符规则
    private int[] buildBadCharTable(String pattern) {
        int[] badChar = new int[256]; // ASCII字符表
        Arrays.fill(badChar, -1);

        for (int i = 0; i < pattern.length(); i++) {
            badChar[pattern.charAt(i)] = i; // 记录字符最后出现的位置
        }
        return badChar;
    }

    // BM匹配算法
    public int bmSearch(String text, String pattern) {
        int[] badChar = buildBadCharTable(pattern);
        int n = text.length();
        int m = pattern.length();
        int shift = 0; // 模式串的移动距离

        while (shift <= n - m) {
            int j = m - 1;

            // 从右向左匹配
            while (j >= 0 && pattern.charAt(j) == text.charAt(shift + j)) {
                j--;
            }

            if (j < 0) {
                return shift; // 匹配成功
            } else {
                // 根据坏字符规则计算移动距离
                shift += Math.max(1, j - badChar[text.charAt(shift + j)]);
            }
        }
        return -1; // 未找到匹配
    }

    public static void main(String[] args) {
        BM bm = new BM();
        String text = "HERE IS A SIMPLE EXAMPLE";
        String pattern = "EXAMPLE";
        int index = bm.bmSearch(text, pattern);
        System.out.println("BM Match at index: " + index);
    }
}

3. Sunday算法

Sunday算法是一种简单的字符串匹配算法，通过预处理模式串，利用匹配失败时文本串中下一个字符的信息跳过不必要的比较。

算法步骤：

预处理模式串，生成偏移表（记录每个字符在模式串中最后出现的位置）。
从左向右匹配模式串，匹配失败时根据偏移表跳过字符。

代码实现：

import java.util.HashMap;
import java.util.Map;

public class Sunday {
    // 构建偏移表
    private Map<Character, Integer> buildShiftTable(String pattern) {
        Map<Character, Integer> shiftTable = new HashMap<>();
        int length = pattern.length();

        for (int i = 0; i < length; i++) {
            shiftTable.put(pattern.charAt(i), length - i); // 记录字符的偏移量
        }
        return shiftTable;
    }

    // Sunday匹配算法
    public int sundaySearch(String text, String pattern) {
        Map<Character, Integer> shiftTable = buildShiftTable(pattern);
        int n = text.length();
        int m = pattern.length();
        int i = 0; // 文本串的起始位置

        while (i <= n - m) {
            int j = 0;

            // 匹配模式串
            while (j < m && text.charAt(i + j) == pattern.charAt(j)) {
                j++;
            }

            if (j == m) {
                return i; // 匹配成功
            } else {
                // 根据偏移表跳过字符
                if (i + m >= n) {
                    break; // 超出文本串范围
                }
                char nextChar = text.charAt(i + m);
                i += shiftTable.getOrDefault(nextChar, m + 1);
            }
        }
        return -1; // 未找到匹配
    }

    public static void main(String[] args) {
        Sunday sunday = new Sunday();
        String text = "HERE IS A SIMPLE EXAMPLE";
        String pattern = "EXAMPLE";
        int index = sunday.sundaySearch(text, pattern);
        System.out.println("Sunday Match at index: " + index);
    }
}

总结

KMP算法：适合模式串中有较多重复字符的情况，时间复杂度为 O(n + m)。
BM算法：适合模式串较长的情况，平均时间复杂度为 O(n/m)。
Sunday算法：实现简单，适合短模式串，平均时间复杂度为 O(n/m)。

根据具体场景选择合适的算法可以提高字符串匹配的效率。

冬月小站

第五章：字符串匹配

1. KMP算法

算法步骤：

代码实现：

2. BM算法

算法步骤：

代码实现：

3. Sunday算法

算法步骤：

代码实现：

总结