微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

sql-server – 通过在T-SQL中设置种子,从均匀分布生成随机值

我想从T-sql中给定数​​据表的每行的均值分布生成一个随机值,其中mean = 0,标准devation = 1.另外,我想设置一个种子,以确保分析的重现性.以下是不起作用的想法:

>使用具有声明数字的函数RAND()不符合此目标:为数据集的每一行生成相同的随机值.
>这样的解决方案:

SELECT ABS(CAST(CAST(NEWID()AS VARBINARY)AS INT))AS [RandomNumber]

由于不可重复,因此也不能解决问题.

编辑:

表现很重要,因为我的桌子有数亿条记录.

解决方法

这里的主要问题IMHO是怎么看到“可重复性”?或者不同的问题:什么驱动着随机性?只要数据不变,我可以设想一个解决方案,每个运行的每个记录都附加相同的随机数.但是,如果数据发生变化,预计会发生什么?

为了乐趣,我对一个(不是非常有代表性的)100万行的测试表进行了以下测试:

-- seed
SELECT Rand(0)

-- will show the same random number for EVERY record
SELECT Number,blah = Convert(varchar(100),NewID()),random = Rand()
  INTO #test
  FROM master.dbo.fn_int_list(1,1000000)

CREATE UNIQUE CLUSTERED INDEX uq0_test ON #test (Number)

SET NOCOUNT ON

GO
DECLARE @start_time datetime = CURRENT_TIMESTAMP,@c_number int

-- update each record (one by one) and set the random number based on 'the next Rand()' value
-- => the order of the records drives the distribution of the Rand() value !

-- seed
SELECT @c_number = Rand(0) 

-- update 1 by 1 
DECLARE cursor_no_transaction CURSOR LOCAL STATIC
    FOR SELECT Number
          FROM #test
         ORDER BY Number
OPEN cursor_no_transaction 
FETCH NEXT FROM cursor_no_transaction INTO @c_number
WHILE @@FETCH_STATUS = 0
    BEGIN
        UPDATE #test 
           SET random = Rand()
         WHERE Number = @c_number

        FETCH NEXT FROM cursor_no_transaction INTO @c_number
    END
CLOSE cursor_no_transaction 
DEALLOCATE cursor_no_transaction 

PRINT 'Time needed (no transaction) : ' + Convert(nvarchar(100),DateDiff(ms,@start_time,CURRENT_TIMESTAMP)) + ' ms.'

SELECT _avg = AVG(random),_stdev = STDEV(random) FROM #test

GO

DECLARE @start_time datetime = CURRENT_TIMESTAMP,@c_number int

BEGIN TRANSACTION

-- update each record (one by one) and set the random number based on 'the next Rand()' value
-- => the order of the records drives the distribution of the Rand() value !

-- seed
SELECT @c_number = Rand(0) 

-- update 1 by 1 but all of it inside 1 single transaction
DECLARE cursor_single_transaction CURSOR LOCAL STATIC
    FOR SELECT Number
          FROM #test
         ORDER BY Number
OPEN cursor_single_transaction 
FETCH NEXT FROM cursor_single_transaction INTO @c_number
WHILE @@FETCH_STATUS = 0
    BEGIN
        UPDATE #test 
           SET random = Rand()
         WHERE Number = @c_number

        FETCH NEXT FROM cursor_single_transaction INTO @c_number
    END
CLOSE cursor_single_transaction 
DEALLOCATE cursor_single_transaction 

COMMIT TRANSACTION

PRINT 'Time needed (single transaction) : ' + Convert(nvarchar(100),_stdev = STDEV(random) FROM #test

GO

DECLARE @start_time datetime = CURRENT_TIMESTAMP

-- update each record (single operation),use the Number column to reseed the Rand() function for every record
UPDATE #test 
    SET random = Rand(Number)

PRINT 'Time needed Rand(Number) : ' + Convert(nvarchar(100),use 'a bunch of fields' to reseed the Rand() function for every record
UPDATE #test 
    SET random = Rand(BINARY_CHECKSUM(Number,blah))

PRINT 'Time needed Rand(BINARY_CHECKSUM(Number,blah)) : ' + Convert(nvarchar(100),_stdev = STDEV(random) FROM #test

结果或多或少是如预期的:

Time needed (no transaction) : 24570 ms.
_avg                   _stdev
---------------------- ----------------------
0.499630943538644      0.288686960086461

Time needed (single transaction) : 14813 ms.
_avg                   _stdev
---------------------- ----------------------
0.499630943538646      0.288686960086461

Time needed Rand(Number) : 1203 ms.
_avg                   _stdev
---------------------- ----------------------
0.499407423620328      0.291093824839539

Time needed Rand(BINARY_CHECKSUM(Number,blah)) : 1250 ms.
_avg                   _stdev
---------------------- ----------------------
0.499715398881586      0.288579510523627

所有这些都是“可重复的”,问题是“可重复”是指你想要的意思.我已经坚持使用AVG()和STDEV()来获取一个粗略的分布概念,我会留给你看看他们是否真的适合账单(如果不是,如何改进它)

1百万行的1.2秒对于100万行IMHO来说听起来不太好.也就是说,如果您的表格包含额外的列,它将占用更多的空间,因此需要更多的时间!

希望这能让你开始…

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


SELECT a.*,b.dp_name,c.pa_name,fm_name=(CASE WHEN a.fm_no='LJCG001H' THEN dbo.ELTPNAME(a.fw_nu) ELSE d.fm_name END),e.fw_state_nm,f.fw_rmk_nm
if not exists(select name from syscolumns where name='tod_no' and id=object_id('iebo09d12')) alter table iebo09d12 add tod_no varchar(
select a.*,pano=a.pa_no,b.pa_name,f.dp_name,e.fw_state_nm,g.fa_name from LJSS007H a (nolock) Left join LJPA002H b (nolock) On a.pa_no =b.pa_no Left jo
要在 SQL Server 2019 中设置定时自动重启,可以使用 Windows 任务计划程序。下面是详细的步骤: 步骤一:创建批处理文件 打开记事本。 输入以下内容: net stop "SQL Server (MSSQLSERVER)" net start "SQ
您收到的错误消息表明数据库 'EastRiver' 的事务日志已满,导致数据库操作失败。要解决这个问题,可以按照以下步骤操作: 1. 备份事务日志首先,备份事务日志以释放空间: BACKUP LOG [EastRiver] TO DISK = N'C:\Backup\East
首先我需要查询出需要使用SQL Server Profiler跟踪的数据库标识ID,若不知道怎么查询数据库的标识ID, 打开SQL Server management studio,点击工具。选择SQL Server Profiler。 登录,登录成功后,如果有个默认弹窗,先取消 新建追踪 命名
--最新的解决方法 --先创建用户帐户,不进行授权,然后通过下面的SQL语句将该用户帐户关联至对应的数据库用户。优点是避免了重新授权的操作。 USE tempdbEXEC sp_change_users_login 'Update_One', 'iemis', &#3
命令: ALTER TABLE 表名 add 列名 数据类型 default 默认值 not null 例如: ALTER TABLE LJEL005H add el_req int default 15 not null
declare @i int set @i=340 while @i<415 begin set @i=@iʱ insert into LJWK007H select '2024','28','9110','3PTSD621000000