好的,所以我一直致力于从上传的SDS文件中提取CAS编号(在转到pdf之前使用docx).我已成功将docx转换为页面中的字符串,但如果存在,我需要提取几个字符串.这是我正在使用的代码,我认为我根本没有正确使用preg_match_all.
$docObj = new DocxConversion($_FILES["sdsFile"]["tmp_name"]);
$docText = $docObj->convertToText();
preg_match_all("/[0-9]{2,7}-[0-9]{2}-[0-9]{1}/", $docText, $matches);
print_r($matches);
这给了我Array([0] => Array()).我正在寻找时不是很有帮助:
> 64742-47-8
> 64742-65-0
> 9003-29-6
$docText的输出是:
IDENTIFICATION PRODUCT IDENTIFIER USED ON LABEL: Finished Product Item Number Customer Item Number LABEL DESCRIPTION ACTUAL BRAND SM5802EE ECHO POWERBLEND X EXTENDED LIFE OIL ECHO SMGR33EC 6450005 ECHO POWER BLEND X ECHO SMGR01EC 6450025 ECHO POWER BLEND X ECHO SMGR07EC 6450002 ECHO POWER BLEND X ECHO SM5101EC X6972270101/99988800086 ECHO POWER BLEND X ECHO SM5905EC 6450250 ECHO BAR & CHAIN OIL ECHO SM5818ER 6450114 ECHO POWER BLEND X HIGH PERFORMANCE 2 stroke ENGINE ECHO SM5818EG 6450103 ECHO POWER BLEND X ECHO SM5238EC 99988800088 ECHO POWER BLEND X ECHO SM5218EC X6972270201/99988800085 ECHO POWER BLEND X ECHO SMGR25EC X6974100202 ECHO POWER BLEND X ECHO SMGR02EC 6450001 ECHO POWER BLEND X ECHO SMGR29EC 6450000 ECHO POWERBLEND X ECHO SM5818EE 6450102 ECHO POWER BLEND X LOW SMOKE ECHO SM5818EC 6450100/6450099 ECHO POWER BLEND X ECHO SM5818EM 6450060 ECHO POWER BLEND X ECHO SMGR34EE ECHO POWERBLEND X ECHO SM5906EC 6450050 ECHO POWER BLEND X ECHO SM5906EM 6450062 ECHO POWER BLEND X ECHO SM5943EE 6450116 ECHO POWER BLEND X ECHO SMGR33EK 6450118 ECHO POWERBLEND X ECHO SMGR34ER 6450109 ECHO POWER BLEND X ECHO SM5926EC 6450006 ECHO POWERBLEND X XTENDED LIFE OIL ECHO SMGR34EE ECHO POWER BLEND X ECHO SMGR34EC 6450108 ECHO POWER BLEND X ECHO SMGR12EC 99988800089 ECHO POWER BLEND X ECHO SMGR34EK 6450119 ECHO POWERBLEND X ECHO SM5834EM 6450061 ECHO POWER BLEND X ECHO Finished Product Item Number Customer Item Number LABEL DESCRIPTION ACTUAL BRAND SMGR34EG 6450115 ECHO POWER BLEND X ECHO SM5955EC 6452750 ECHO POWER BLEND X ECHO RECOMMENDED USE OF THE CHEMICAL AND RESTRICTIONS ON USE; PETROLEUM LUBRICATING OIL NO OTHER USES RECOMMENDED NAME, ADDRESS, AND TELEPHONE NUMBER OF THE CHEMICAL MANUFACTURER, IMPORTER, OR OTHER RESPONSIBLE PARTY: 1.3.1. Spectrum Lubricants Corporation 500 Industrial Park Drive Selmer, TN 38375‐3276 United States of America Product information MSDS Requests: (800) 264‐6457 or +17316454972 Technical information: (800) 264‐6457 or +17316454972 General information: vswedley@spectrumcorporation.comEMERGENCY PHONE NUMBER: 1.4.1. Emergency Response north America: CHEMTREC (800) 424‐9300 after 5:00pm CST Or +17035273887 Health Emergency USA: (800) 264‐6457 or +17316454972 HAZARD(S) IDENTIFICATION CLASSIFICATION OF THE CHEMICAL IN ACCORDANCE WITH ParaGRAPH (d) of §1910.1200: Acute Inhalation Category 4 Eye Irritant Category 2 Skin Corrosion/Irritation Category 2 Flammable Liquid Category 4 Signal Word: Warning Symbol: Hazard Statements: Harmful if Inhaled Causes serIoUs eye irritation Causes skin irritation Combustible Liquid Precautionary Statements: Prevention: Avoid breathing mist or spray. Use only outdoors or in a well‐ventilated area. Wear eye/face protection Wear protective gloves Keep away from heat, hot surfaces, sparks, open flames and other ignition sources. No smoking. Response: If inhaled: Remove person to fresh air and keep comfortable for breathing. If in eyes: Rinse cautIoUsly with water for several minutes. Remove contact lenses, if present and easy to do. Continue rinsing. If eye irritation persists get medical advice/attention. If on skin: wash with plenty of water, if irritation or rash occurs get medical advice/attention. Take off contaminated clothing and wash it before reuse. Call a poison center/doctor if you feel unwell. In case of fire: Use water fog, foam, dry chemical or carbon dioxide (CO2) to extinguish flames. Storage: Store in well‐ventilated place. disposal: dispose of contents/container in accordance with local/regional/national/international regulations. Composition/ information on ingredients The chemical name and concentration (exact percentage) or concentration ranges of all ingredients which are classified as health hazards in accordance with paragraph (d) of §1910.1200 3.1.1. COMPONENTS CAS Number EU Number Concentration (%) Hazard Statements (see Section 16) distillates (petroleum), hydrotreated light 64742‐47‐8 265‐149‐8 10‐30 H226, H304, H315, Solvent‐dewaxed heavy paraffinic distillates 64742‐65‐0 265‐169‐7 40‐50 H315, H332 polyiosbutylene 9003‐29‐6 Not available 40‐70 H315, H319, H332 FirsT AID MEASURES
还有更多,但我会饶你…
解决方法:
您需要添加其他连字符:
~\d{2,7}\p{Pd}\d{2}\p{Pd}\d~u
细分:
~ # pattern delimiter
\d{2,7} # digits, 2-7 times
\p{Pd} # matches any kind of hyphen or dash (including unicode characters)
\d{2} # 2 digits
\p{Pd} # same as above
\d # one digit
~ # pattern delimiter
u # unicode flag (pattern modifier)
在PHP中:
preg_match_all('~\d{2,7}\p{Pd}\d{2}\p{Pd}\d~u', $docText, $matches);
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。