MySQL/MariaDBにおけるPERCENTILE_CONTの代替手段:データサイエンティストのためのベストプラクティス
MySQL/MariaDBにおけるPERCENTILE_CONTの代替手段
代替手段として、以下の方法が考えられます。
ウィンドウ関数
MySQL 8.0以降とMariaDB 10.3.3以降では、ウィンドウ関数を使用してPERCENTILE_CONTをより効率的に実装することができます。
SELECT
DISTINCT first_value(matrix_value) OVER (
ORDER BY CASE WHEN p <= 0.05 THEN p END DESC
) AS x
FROM (
SELECT
matrix_value,
percent_rank() OVER (ORDER BY matrix_value) AS p
FROM some_table
) AS t;
サブクエリ
以下のサブクエリを使用して、PERCENTILE_CONTをエミュレートすることができます。
SELECT
COUNT(*) AS count_below_p
FROM some_table AS t
WHERE matrix_value <= (
SELECT
matrix_value
FROM some_table AS t2
ORDER BY matrix_value
LIMIT 1 OFFSET FLOOR(0.05 * COUNT(*) OVER ())
);
外部ライブラリ
PERCENTILE_CONTのより高速で正確な実装を提供する外部ライブラリも存在します。
パーセンタイル近似
補足
- Oracleでは、PERCENTILE_CONT関数に相当する組み込み関数は存在しません。
- 上記の代替手段は、MySQLとMariaDBのみに適用されます。
CREATE TABLE some_table (
id INT PRIMARY KEY AUTO_INCREMENT,
matrix_value DECIMAL(10,2)
);
INSERT INTO some_table (matrix_value) VALUES
(12.34),
(56.78),
(34.56),
(23.45),
(98.76);
SELECT
DISTINCT first_value(matrix_value) OVER (
ORDER BY CASE WHEN p <= 0.05 THEN p END DESC
) AS x
FROM (
SELECT
matrix_value,
percent_rank() OVER (ORDER BY matrix_value) AS p
FROM some_table
) AS t;
CREATE TABLE some_table (
id INT PRIMARY KEY AUTO_INCREMENT,
matrix_value DECIMAL(10,2)
);
INSERT INTO some_table (matrix_value) VALUES
(12.34),
(56.78),
(34.56),
(23.45),
(98.76);
SELECT
COUNT(*) AS count_below_p
FROM some_table AS t
WHERE matrix_value <= (
SELECT
matrix_value
FROM some_table AS t2
ORDER BY matrix_value
LIMIT 1 OFFSET FLOOR(0.05 * COUNT(*) OVER ())
);
サンプルコードは、あくまでも参考情報として提供されています。
他の方法
SELECT
COUNT(*) AS count_below_p
FROM some_table AS t
JOIN (
SELECT
matrix_value,
ROW_NUMBER() OVER (ORDER BY matrix_value) AS row_num
FROM some_table
) AS t2
ON t.matrix_value <= t2.matrix_value
WHERE t2.row_num <= FLOOR(0.05 * COUNT(*) OVER ())
CTE (Common Table Expression)
WITH percentile_cte AS (
SELECT
matrix_value,
ROW_NUMBER() OVER (ORDER BY matrix_value) AS row_num,
COUNT(*) OVER () AS total_count
FROM some_table
)
SELECT
COUNT(*) AS count_below_p
FROM some_table AS t
JOIN percentile_cte AS cte
ON t.matrix_value <= cte.matrix_value
WHERE cte.row_num <= FLOOR(0.05 * cte.total_count)
ビュー
CREATE VIEW percentile_view AS
SELECT
matrix_value,
ROW_NUMBER() OVER (ORDER BY matrix_value) AS row_num,
COUNT(*) OVER () AS total_count
FROM some_table;
SELECT
COUNT(*) AS count_below_p
FROM some_table AS t
JOIN percentile_view AS v
ON t.matrix_value <= v.matrix_value
WHERE v.row_num <= FLOOR(0.05 * v.total_count)
mysql oracle mariadb