Time complexity of string.substr()

#	User	Rating
1	tourist	3803
2	jiangly	3707
3	Benq	3627
4	ecnerwala	3584
5	orzdevinwang	3573
6	Geothermal	3569
6	cnnfls_csy	3569
8	Radewoosh	3542
9	jqdai0815	3532
10	gyh20	3447

#	User	Contrib.
1	awoo	163
2	maomao90	162
3	adamant	161
4	maroonrk	152
5	-is-this-fft-	151
6	nor	150
7	atcoder_official	147
7	SecondThread	147
9	TheScrasse	146
10	Petr	145

According to https://cplusplus.com/reference/string/string/substr/ Complexity = Unspecified, but generally linear in the length of the returned object.

However, I believe in practice it's much faster, specially for repeated calls with same start_pos.

Example problem: https://leetcode.com/contest/weekly-contest-377/problems/minimum-cost-to-convert-string-ii/

Solution from contest winner below

Solution

const int N = 300;
const long long INF = 0x3F3F3F3F3F3F3FLL;

long long dis[N][N];

class Solution {
public:
    long long minimumCost(string source, string target, vector<string>& original, vector<string>& changed, vector<int>& cost) {
        map<string, int> label;
        for (auto& v : original) {
            label[v];
        }
        for (auto& v : changed) {
            label[v];
        }
        int total = 0;
        for (auto& it : label) {
            it.second = total++;
        }
        
        for (int i = 0; i < total; ++i) {
            for (int j = 0; j < total; ++j) {
                dis[i][j] = INF;
            }
            dis[i][i] = 0;
        }
        
        for (int i = 0; i < original.size(); ++i) {
            int u = label[original[i]];
            int v = label[changed[i]];
            
            dis[u][v] = min(dis[u][v], (long long) cost[i]);
        }
        for (int k = 0; k < total; ++k) {
            for (int i = 0;i < total; ++i) {
                for (int j = 0; j < total; ++j) {
                    if (i != j && j != k && k != i) {
                        dis[i][j] = min(dis[i][j], dis[i][k] + dis[k][j]);
                    }
                }
            }
        }
        
        vector<int> lens;
        for (auto& v : original) {
            lens.push_back(v.size());
        }
        sort(lens.begin(), lens.end());
        lens.erase(unique(lens.begin(), lens.end()), lens.end());
        
        int n = source.size();
        vector<long long> dp(n + 1, INF);
        dp[0] = 0;
        
        for (int i = 0; i < n; ++i) {
            if (source[i] == target[i]) {
                dp[i + 1] = min(dp[i + 1], dp[i]);
            }
            for (int l : lens) {
                if (i - l + 1 >= 0) {
                    string s = source.substr(i - l + 1, l);
                    string t = target.substr(i - l + 1, l);
                    auto u = label.find(s);
                    auto v = label.find(t);
                    if (u != label.end() && v != label.end()) {
                        dp[i + 1] = min(dp[i + 1], dp[i - l + 1] + dis[u->second][v->second]);
                    }
                }
            }
        }
        if (dp[n] >= INF) {
            return -1;
        }
        return dp[n];
    }
};

My analysis of the time complexity for the code above: I think substr() call should result in timeout. STL says complexity of substr(x, len) = len. Therefore, the dp loop is n * lens.size() * max_len where, n = source.size(), and max_len = max(lens[i]) for all i.

Eg. in the case where n = 1000, and we have lens = [900, 901, ..., 999]. Therefore,

Outer loop > for (int i = 0; i < n; ++i) n = 1000,
Inner loop > for (int l : lens), lens = [900, 901, ..., 999]
Inside inner loop. we call substr(st, l), in O(l). But max(l) = n

Thus, since max(l) = max_len = 999,

Time Complexity = n * lens.size() * max_len
Time Complexity = n * lens.size() * n
Time Complexity = 1000*100*1000, which should TLE

There must be something going on making substr() more efficient. My guess is caching susbtr() calls so substr(i, x+d), uses previously queried substr(i, x),

Would love to understand more about the optimization going on in substr(). Or would this solution always give TLE for this test case, indicating that it could be hacked (even if not supported in Leetcode)?

Only thing I found is from https://stackoverflow.com/questions/4679746/time-complexity-of-javas-substring

stackoverflow

Comments (4)

Write comment?

Jomax100

6 months ago, # |

Auto comment: topic has been updated by Jomax100 (previous revision, new revision, compare).

→ Reply

vgtcross

$$$1000\cdot100\cdot1000=10^8$$$, which doesn't necessarily TLE.
substr(i, len) creates a new copy with length len — that's impossible to do in $$$o(\mathrm{len})$$$.

6 months ago, # ^ |

← Rev. 2 →

What about replacing

       for(int len: lens){
                if(i+len > n) break;
                string cur = source.substr(i, len);
                string need = target.substr(i, len);

with the following

       string s, t;
       for(int len = 1; len <= lens.back(); len++){
                if(i+len > n) break;
                s += source[i+len-1];
                t += target[i+len-1];

Previously inner loop = lens * n = 100*1000

Now inner loop might be n since max(len) <= n

However, inside the inner loop, we compute substr() in O(1), and the dominant term is the O(log(200)) call to label.find(s) in the map<string,int>.

So inner loop = n * log(200)

Final ans = n*n*log(200) = 1000*1000*8

But that does TLE

Maybe label.find(s) has it's time complexity log(label.size()) + some cost related to s.size()?

Actually tried replacing substr() with manual implementation and, while slower, still got Accepted:

Before

string cur = source.substr(i, cur_size);
string need = target.substr(i, cur_size);

After:

forn(j,cur_size) cur += source[i+j];
forn(j,cur_size) need += target[i+j];

So I guess it must be weak test cases.

Jomax100's blog