统计重新排列后包含另一个字符串的子字符串数目I

2025-01-09

leetcode

题目：

给你两个字符串 word1 和 word2 。如果一个字符串 x 重新排列后，word2 是重排字符串的

前缀，那么我们称字符串 x 是 合法的 。请你返回 word1 中合法子字符串的数目。

示例 1：

输入：word1 = “bcca”, word2 = “abc”

输出：1

解释：唯一合法的子字符串是 "bcca" ，可以重新排列得到 "abcc" ，"abc" 是它的前缀。

示例 2：

输入：word1 = “abcabc”, word2 = “abc”

输出：10

解释：除了长度为 1 和 2 的所有子字符串都是合法的。

示例 3：

输入：word1 = “abcabc”, word2 = “aaabc”

输出：0

解释：

1 <= word1.length <= 105
1 <= word2.length <= 104
word1 和 word2 都只包含小写英文字母。

暴力破解

思路：

利用滑动窗口枚举 word1 的所有子串，利用哈希表记录所有子串中字符出现的次数，跟 word2 比较即可

代码：

class Solution {
public:
    bool isValid(unordered_map<char, int>& word1Hash, unordered_map<char, int> word2Hash) {
        for (auto it : word2Hash) {
            if (word1Hash.find(it.first) == word1Hash.end()) return false;
            if (word1Hash[it.first] < it.second) return false;
        }
        return true;
    }
    long long validSubstringCount(string word1, string word2) {
        long long ans = 0;
        if (word1.size() < word2.size()) return 0;
        unordered_map<char, int> word2Hash;
        for (int i = 0; i < word2.size(); i++) {
            word2Hash[word2[i]]++;
        }
        for (int i = word2.size(); i <= word1.size(); i++) {
            unordered_map<char, int> word1Hash;
            for (int j = 0; j < i; j++) {
                word1Hash[word1[j]]++;
            }
            if (isValid(word1Hash, word2Hash)) ans++;
            for (int startIdx = 1; startIdx <= word1.size() - i; startIdx++) {
                word1Hash[word1[startIdx - 1]]--;
                word1Hash[word1[startIdx + i - 1]]++;
                if (isValid(word1Hash, word2Hash)) ans++;
            }
        }
        return ans;
    }
};

二分法

思路：

我们的目标是求解 word1 中有多少子串经过重新排列后存在一个前缀是 word2，也就是说要求解有多少子串包含 word2 中的全部字符。

对于每个 l(1≤l≤n)，找到最小的 r(l≤r≤n)，使得 word1 区间 [l,r] 内包含 word2 的全部字符，可以发现子串 [l,r+1],[l,r+2],⋯,[l,n] 都是满足要求的，计数 n−r+1 。将所有的计数都加起来就是答案。

而找到每个 l 对应的最小的 r 可以使用二分算法，我们提前预处理出 word2 中所有字符的出现次数，再预处理 word1 每个前缀中每种字符的出现次数。因此在二分查找 r 的过程中，可以 O(C) 时间判断是否满足要求（C 是字符数量，此处等于 26），而那个最小的那个满足要求的下标就是我们要找的 r。

代码：

class Solution {
public:
    long long validSubstringCount(string word1, string word2) {
        vector<int> count(26, 0);
        for (auto c : word2) {
            count[c - 'a']++;
        }

        int n = word1.size();
        // 记录word1长度为i的子串中每个字符出现的次数，这样pre_count[j] - pre_count[i]就是从i - j 的子串中每个字符出现的次数
        vector<vector<int>> pre_count(n + 1, vector<int>(26, 0));  
        for (int i = 1; i <= n; i++) {
            // 赋值长度为 i - 1的word1中所有字符出现的次数
            pre_count[i].assign(pre_count[i - 1].begin(), pre_count[i - 1].end());  
            pre_count[i][word1[i - 1] - 'a']++;   // 新字符数量+1
        }

        auto get = [&](int l, int r) {
            int border = l;
            while (l < r) {
                int m = l + r >> 1;
                bool f = true;
                for (int i = 0; i < 26; i++) {
                    if (pre_count[m][i] - pre_count[border - 1][i] < count[i]) {
                        f = false;
                        break;
                    }
                }
                if (f) {
                    r = m;
                } else {
                    l = m + 1;
                }
            }
            return l;
        };

        long long res = 0;
        for (int l = 1; l <= n; l++) {
            int r = get(l, n + 1);
            res += n - r + 1;
        }
        return res;
    }
};