4 views (last 30 days)

Show older comments

I have a vector with integer or floating point values and want to count the number of occurences.

histcounts() uses the lase element of the input Edge as upper limit for the last bin, so I have to append a number larger than the greatest element to count correctly. This slows down the processing by a factor of > 2, when the values are integers:

% Integer values:

x = round((1:1e6) / 1); % Equivalent for "/ 10" also

x = x(randperm(numel(x)));

ux = unique(x);

ux2 = [ux, Inf];

ux3 = [ux, 1e8]; % Any huge number

ux4 = [ux, ux(end) + 1];

timeit(@() histcounts(x, ux)) % 0.045

timeit(@() histcounts(x, ux2)) % 0.171 % Much slower!!!

timeit(@() histcounts(x, ux3)) % 0.199 % Slower than with Inf?

timeit(@() histcounts(x, ux4)) % 0.0787 % Fast again

% BinMethod=Integers is fast, but the number of bins is limited:

timeit(@() histcounts(x, 'BinMethod', 'Integers')) % 0.0059 Fast!

numel(histcounts(x, 'BinMethod', 'Integers')) % But just 10001 instead of 1e6

% According to the documentation, this should be 65536 ?!?

% Floating point values:

y = rand(1, 1e6);

uy = unique(y);

uy2 = [uy, Inf];

uy3 = [uy, uy(end) + 1];

timeit(@() histcounts(y, uy)) % 0.11148

timeit(@() histcounts(y, uy2)) % 0.2066

timeit(@() histcounts(y, uy3)) % 0.2100

% Local R2018b: similar behavior for integers, floating point values processed faster:

% 0.073 % Integer, ux

% 0.199 % [ux, Inf]

% 0.189 % [ux, 1e8]

% 0.061 % [ux, ux(end + 1)]

% 0.162 % Floating point, uy

% 0.167 % [uy, Inf]

% 0.315 % [uy, uy(end) + 1] % half speed!!!

I'd expect nearly the same speed, because it is almost the same amount of work.

By the way, a simple counting function is fast also:

timeit(@() simpleCount(x)) % 0.0602

timeit(@() simpleCount(y)) % 0.0628

function N = simpleCount(x)

S = sort(x);

q = [true, diff(S) ~= 0];

N = diff([find(q), numel(x) + 1]);

end

histcounts() with counting the last element to the previous bin is not useful, but attaching a final edge slows down the processing massively. I've sent an enhancement request to TWM already.

Bruno Luong
on 22 Aug 2021

Bruno Luong
on 22 Aug 2021

"The BinMethod 'Integers' is useful for the case, when it can be guaranteed, that 2^16 bins are enough"

Just take your hand over and use accumarray instead.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!