如何解决从表单 Last,First,Middle,Suffix 解析名称组件
现在,我有一个全名列,其中全名没有规范化形式。该形式通常遵循 Last,First Middle,Suffix 但它不是所有行的原版。一些示例表单包括:
期望的结果是每个组件都在它自己的列中,不存在的组件为 NULL。以下是一些期望结果的示例。
FULLNAME FirsTNAME MIDDLENAME LASTNAME SUFFIX
Johnson,John Johnny,Jr. John Johnny Johnson Jr
Anderson,Andrew A,Sr. Andrew A Anderson Sr
Smith,Smitty Jr. Smitty NULL Smith Jr
Abegnale,Frank Frank NULL Abegnale NULL
Henry,King III King NULL Henry III
Garcia,Jerome John Jerome John Garcia NULL
我目前的解决方案是这样的:
SELECT
FullName,SUBSTRING(FullNM,1,CHARINDEX(',',FullNM) - 1) AS LastName,CASE
WHEN LEN(SUBSTRING(FullNM,FullNM)+ 1,99)) - LEN(REPLACE(SUBSTRING(FullNM,99),' ','')) > 0
THEN REPLACE(SUBSTRING(FullNM,LEN(FullNM) - CHARINDEX(' ',REVERSE(FullNM))+1,'.','')
ELSE NULL
END AS MiddleName,FullNM) + 1,'')) > 0
THEN SUBSTRING(FullNM,(LEN(SUBSTRING(FullNM,99)) - LEN(SUBSTRING(FullNM,REVERSE(FullNM)) + 1,99))))
ELSE SUBSTRING(FullNM,99)
END AS FirstNM
FROM MyTable
不幸的是,我只是想不出格式化后缀的好方法,尤其是在没有中间名的情况下。使用当前代码,如果有后缀,则将其添加为 MiddleNM。
非常感谢任何帮助或建议!
解决方法
请尝试以下解决方案。它使用 XQuery 来标记 FullName 列。
正如 Gordon 提到的,您可能需要扩展合法后缀列表。
SQL
-- DDL and sample data population,start
DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY,FullName VARCHAR(100));
INSERT INTO @tbl (FullName) VALUES
('Johnson,John Johnny,Jr.'),('Anderson,Andrew A,Sr.'),('Smith,Smitty Jr.'),('Abegnale,Frank'),('Henry,King III'),('Garcia,Jerome John');
DECLARE @suffix TABLE (suffix VARCHAR(10));
INSERT INTO @suffix (suffix) VALUES
('Jr.'),('III'),('Sr.');
-- DDL and sample data population,end
DECLARE @separator CHAR(1) = ',';
;WITH rs AS
(
SELECT *,TRY_CAST('<root><x><![CDATA[' +
REPLACE(FullName + SPACE(1) COLLATE Czech_BIN2,@separator,']]></x><x><![CDATA[') +
']]></x></root>' AS XML) AS xmldata
FROM @tbl
),cte AS
(
SELECT rs.*,x.pos,x.size,LEFT(c.value('(x[2]/text())[1]','VARCHAR(30)'),pos - 1) AS FirstName,c.value('(x[1]/text())[1]','VARCHAR(30)') AS LastName,TRIM(RIGHT(c.value('(x[2]/text())[1]',IIF((size - pos) < 0,size-pos+1))) AS MiddleName,TRIM(COALESCE(c.value('(x[3]/text())[1]',RIGHT(c.value('(x[2]/text())[1]',size-pos+1)))) AS Suffix
FROM rs CROSS APPLY xmldata.nodes('/root') AS t(c)
CROSS APPLY (SELECT CHARINDEX(SPACE(1),c.value('(x[2]/text())[1]','VARCHAR(30)')),LEN(c.value('(x[2]/text())[1]','VARCHAR(30)'))) AS x(pos,size)
)
SELECT ID,FullName,cte.FirstName,IIF(MiddleName IN (SELECT Suffix FROM @suffix,'',cte.MiddleName) AS MiddleName,cte.LastName,IIF(cte.Suffix NOT IN (SELECT Suffix FROM @suffix),cte.Suffix) AS Suffix
FROM cte;
输出
+----+-------------------------+-----------+------------+----------+--------+
| ID | FullName | FirstName | MiddleName | LastName | Suffix |
+----+-------------------------+-----------+------------+----------+--------+
| 1 | Johnson,Jr. | John | Johnny | Johnson | Jr. |
| 2 | Anderson,Sr. | Andrew | A | Anderson | Sr. |
| 3 | Smith,Smitty Jr. | Smitty | | Smith | Jr. |
| 4 | Abegnale,Frank | Frank | | Abegnale | |
| 5 | Henry,King III | King | | Henry | III |
| 6 | Garcia,Jerome John | Jerome | John | Garcia | |
+----+-------------------------+-----------+------------+----------+--------+
,
有点脑痛和后来的 Sql 服务器错误,这是您可以尝试的一种可能的解决方案。
这有点冗长,正如建议的那样,确实需要您提前知道可能的后缀是什么。
with
num as (select top(30) Row_Number() over(order by (select null)) n from sys.messages),parts as (
select top(1000) n.Fullname,w.n,/*sql server optimizer issue workaround*/
IsNull(Iif( CharIndex(',',name1)=0 and CharIndex(' ',name1)=0,name1,null),Iif( CharIndex(',name2)=0 and CharIndex(' ',name2)=0,name2,null)) part
from Names n
cross apply (
select num.n,Substring(n.Fullname,num.n,CharIndex(',n.Fullname + ',num.n) - num.n) name1,CharIndex(' ',n.Fullname + ' ',num.n) - num.n) name2
from num
where num.n<DataLength(n.Fullname) and Substring(',' + n.Fullname,1) in (',' ')
)w
)
select Fullname,Max(FirstName) firstName,Max(Iif(MiddleName=FirstName or MiddleName=Lastname,null,MiddleName))MiddleName,Max(Lastname) LastName,Max(Suffix) Suffix
from (
select Fullname,case when Lag(n) over(partition by Fullname order by n)=Min(n) over(partition by Fullname) then part end FirstName,case when Lead(n) over(partition by Fullname order by n)=Max(n) over(partition by Fullname) and Suffix is null
or n=Max(n) over(partition by Fullname) and Suffix is null then part end MiddleName,case when n=Min(n) over(partition by Fullname) then part end Lastname,case when n=Max(n) over(partition by Fullname) and Suffix is not null then part end Suffix
from parts p
left join Suffixes s on s.Suffix=p.part
where p.part !=''
)x
group by Fullname
order by Fullname
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。