从多个文件打印常见行 file1.txt： file2.txt file3.txt res.txt

如何解决从多个文件打印常见行 file1.txt： file2.txt file3.txt res.txt

我有3个文件，其中包含任意数量的行（在第一行中指定）。我想获取这些文件中的所有常见行。例如，在每个文件中，文件都有很多行，每行包含四个以空格分隔的坐标。

file1.txt：

5    
820.3  262.48  637.815  232.503  
657.666  773.366  466.608  754.035  
341.845  245.408  163.417  212.897  
667.378  687.189  474.277  666.181  
518.451  899.594  343.431  881.08

file2.txt

3  
1.52 6.878 9.5485  
341.845  245.408  163.417  212.897  
667.378  687.189  474.277  666.181

file3.txt

4  
657.666  773.366  466.608  754.035  
341.845  245.408  163.417  212.897  
667.378  687.189  474.277  666.181  
518.451  899.594  343.431  881.08

我的输出文件res.txt应该是：

res.txt

2  
341.845  245.408  163.417  212.897  
667.378  687.189  474.277  666.181

在这里，我们有2条共同的行，因此应在第一行中打印。如何缩放多个文件？

我尝试编写一个python脚本来处理两个文件，但是我认为效率不高。我尝试的代码是：

import numpy as np

l1 = []
l2 = []

with open('matchings1_2.txt','r') as f1:
    for line in f1:
        line = line.split()
        l1.append(line)

with open('matchings2_3.txt','r') as f2:
    for line in f2:
        line = line.split()
        l2.append(line)


l1 = np.array(l1[1:]).astype(float)
l2 = np.array(l2[1:]).astype(float)
l = []

for r in l1:
    if r in l2:
        l.append(list(r))

l.insert(0,[len(l)])

with open('Result.txt','w') as f:
    for item in l:
        s = ""
        for i in range(len(item)):
            if (i != len(item) - 1):
                s += str(item[i]) + " "
            else:
                s += str(item[i])
        f.write("%s\n" % s)

解决方法

我写了一个较短的代码，希望它不会太复杂，我认为它已经完成了。

rows =[] # to store rows of all files in a nested list
file_names =["f1","f2","f3"] # names of text files
for file in file_names:
    f1 = open(file+".txt","r")
    temp =[] #to store rows of each file separately 
    for i in f1:
        s = i.rstrip() # removes next line character from both ends of each row
        if len(s)!=1: # to exclude first line of each row
            temp.append(s)
    rows.append(set(temp)) # storing as a set so that we can use intersection
    f1.close()

final_rows = rows[0] # initializing as rows of first files
for i in range(1,len(rows)):
    final_rows = final_rows.intersection(rows[i]) # repeated intersection

f1 = open("res.txt",'w')
f1.write(str(len(final_rows))+"\n") # storing the length of common rows
for i in final_rows:
    f1.write(i+"\n") #storing the common rows
f1.close()

如果所有文件都位于相同格式的同一目录中，则可以进行一些更改：

import os
file_names = os.listdir()# if this python file and text files are in same directory or use os.listdir("xyz/abc") incase they are in other directory
for file in file_names:
    f1 = open(file,"r") # use file instead of file+".txt"

如@Aryman的答案中所建议的那样，设置相交可能是实现此目标的方法。要将操作应用于未定义长度的序列，可以使用o x n。

functools.reduce

其中from functools import reduce from pathlib import Path def lines(text_file): with open(text_file) as f: result = f.read().splitlines() return result unique_lines = (set(lines(file)[1:]) # exclude the first line for file in Path('folder').glob('file*.txt')) common_lines = reduce(lambda x,y: x & y,unique_lines) print(list(common_lines))等效于x & y。您也可以使用x.intersection(y)代替lambda。

输出：

operator.and_

我可以为下面的问题编写解决方案，我已经在下面粘贴了。所有评论，所以我希望它易于阅读：）

import os  # a library for accessing the os

all_rows = []  # to load all lines into
res = []  # to load result into
number_files = 0
path_to_files = "."  # you can use "." if your files are in the same directory as the .py file

for file in os.listdir(path_to_files):  # put your path to files here,lists all files in that directory
    if file.startswith("file") and file.endswith(".txt"):
        number_files += 1  # keep a count of number of files for later
        with open(file,"r") as f:
            content = f.readlines()  # read all lines
            content = [x.strip() for x in content]  # remove \n from lines
            all_rows.extend(content)  # add all items of content to all_rows without creating a 2d list
            f.close()
for i in range(1,int(all_rows[0]) + 1):  # all rows in first file
    if all_rows.count(all_rows[i]) == number_files:  # if row occurs in all files
        res.append(all_rows[i])  # append to res
res.insert(0,str(len(res)))  # insert number of rows into res
with open(os.path.join(path_to_files,"res.txt"),"w+") as r:  # create new file in directory called res.txt
    for row in res:  # for every row which all files have in common
        r.write(row + "\n")  # add newline character
    r.close()  # close file