如何删除字符串中的空格和行？

如何解决如何删除字符串中的空格和行？

这就是我所拥有的

S = """Missing Since 06/01/1976

Missing From 
                                Napa,California                          
Classification Endangered Missing
Sex Female
Race 
                                    White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2,130 pounds
Distinguishing Characteristics Caucasian female. Brown hair,hazel eyes."""

我想去的地方

S = """Missing Since 06/01/1976
Missing From Napa,California                            
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2,hazel eyes."""

我尝试使用 S.strip()，但只删除了开头和结尾的空格。

我想知道是否有任何可行的实现（我找不到任何实现）。

我也尝试将 S.replace(" ","") 用于更大的空间，但这也让我无处可去。

解决方法

删除多个空格（不是行）：

print( ' '.join([s for s in S.split(' ') if s.strip()]) )

试试这个：

import re


def normalize_text(get_text):
    saved_new_lines = []
    counter = 0
    for each_line in get_text.split("\n"):
        if not each_line == "":
            normalize_each_line = re.sub(r'\s+',' ',each_line.strip())
            if each_line.startswith(" "):
                saved_new_lines[counter-1] += " " + normalize_each_line
            else:
                saved_new_lines.append(normalize_each_line)
                counter += 1
    return "\n".join(saved_new_lines)


print(normalize_text(S))

输出：

Missing Since 06/01/1976
Missing From Napa,California
Classification Endangered Missing
Sex Female
Race White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2,130 pounds
Distinguishing Characteristics Caucasian female. Brown hair,hazel eyes.

@FedericoBaù 给了我提示；所以我更新了我的代码（这个版本没有任何空行检查器，所以它会比当前状态快得多）

更新：

import re


def normalize_text(get_text):
    saved_new_lines = []
    counter = 0
    for each_line in re.sub(r'\n+','\n',get_text.strip()).splitlines():
        normalize_each_line = re.sub(r'\s+',each_line.strip())
        if each_line.startswith(" "):
            saved_new_lines[counter-1] += " {}".format(normalize_each_line)
        else:
            saved_new_lines.append(normalize_each_line)
            counter += 1
    return "\n".join(saved_new_lines)


print(normalize_text(test_string))

这是一种仅使用字符串文字输入生成请求的字符串文字输出的方法，如（最初）请求的那样：

from re import sub

print(sub('Race\n','Race ',sub('Missing From\n','Missing From ','\n'.join(
              [sub(' \s','',line) for line in [line.strip() for line in """
"Missing Since 06/01/1976

Missing From
                                Napa,California
Classification Endangered Missing
Sex Female
Race
                                    White
Date of Birth 02/06/1957 (63)
Age 19 years old
Height and Weight 5'2,hazel eyes."
""".split('\n') if line.strip()]]))))

输出：

"Missing Since 06/01/1976
Missing From Napa,hazel eyes."

我不喜欢 python 但据我所知在 php 中这些空格是由于 \n 和 \r 而出现的，要修复，您可以简单地执行 string.replace("<double-space-here>","") 然后获取变量并重做过程，但这次替换“\n”然后“\r”。如果你能找到一个内置的方法来过滤 python 中的 html 字符，我会更好。

@Random 尽管已经提供了非常有用的答案，但我想就您所面临的问题提供更多见解，这很常见，尤其是因为函数 strip 的期望是如何。

从逻辑上讲，人们可能认为它应该删除字符串中的所有项目，但它只删除了字符串的 2 端，而不是字符串的“内部”，但为什么呢？

您应该首先了解字符串 a 是如何真正存储在 Python 中的，例如它们实际上就像“数组”（一种列表）：

string = "  Hello world "

实际上是：

string = [" "," ","H",'e','l','o','w','r','d',' ']

所以 strip 函数去检查它从字符串的左边和右边找到的任何字符，然后停止直到遇到不同的字符。

因此它不会在整个字符串（数组）上循环，而只会从索引 0 或索引 -1 开始！

有关这方面的更多信息，我建议查看 array-of-strings-in-c，尽管不是 Python，但 Python 是用 C 编写的并且内部具有相同的实现。

一些解决方案（尚未给出）

使用string.whitespace

     import string

     def string_cleaner(str):
         cleaned_string = []
         string_separated = str.split(' ')
         for word in string_separated:
             if word:
                 if word in string.whitespace:
                     del word
                 else:
                     cleaned_string.append(word)

         ready_baby = ' '.join(cleaned_string)
         return ready_baby


     result = string_cleaner(test_string)

--> 使用列表推导式的简短形式

print(' '.join([s for s in test_string.split(' ') if s and s not in string.whitespace]))

使用函数 isspace（无需导入，纯内置）

 def string_cleaner(str):
     cleaned_string = []
     string_separated = str.split(' ')
     for word in string_separated:
         if word:
             if word.isspace():
                 del word
             else:
                 cleaned_string.append(word)

     ready_baby = ' '.join(cleaned_string)
     return ready_baby


 result = string_cleaner(test_string)

 print(result)

--> 使用列表推导式的简短形式

print(' '.join([string for string in test_string.split(' ') if string and not string.isspace()]))

重新访问re.sub函数

这里是 re.sub 所做的复制，请注意，我添加了许多“无用”变量以使代码更加明确：

def string_cleaner(str):

    cleaned_string = []
    string_separated = str.split(' ')
    for word in string_separated:
        if word: # Not Blank line
            remove_whitespace_from_side = word.strip().replace('\n',' ') # NOTE: we do this because there are multiple \n\n in some string
            separate_each_string = remove_whitespace_from_side.split()
            if separate_each_string:  # NOTE: If empty means that is a useless white spece within a string
                rejoined_sub_string = ' '.join(separate_each_string)
                ready_string = rejoined_sub_string.replace(' ','\n') # Add again \n
                cleaned_string.append(ready_string)
    ready_to_go = ' '.join(cleaned_string)
    return ready_to_go

result = string_cleaner(test_string)

print(result)

文档

Python Re Sub function

“返回通过替换 repl 替换 string 中最左边的非重叠模式所获得的字符串。”。

它的作用类似于C函数scanf，更多信息here。

如何删除字符串中的空格和行？

如何解决如何删除字符串中的空格和行？

解决方法

一些解决方案（尚未给出）

重新访问re.sub函数

文档

相关推荐