从多个文件打印常见行 file1.txt: file2.txt file3.txt res.txt

如何解决从多个文件打印常见行 file1.txt: file2.txt file3.txt res.txt

我有3个文件,其中包含任意数量的行(在第一行中指定)。我想获取这些文件中的所有常见行。例如,在每个文件中,文件都有很多行,每行包含四个以空格分隔的坐标。

file1.txt:

5    
820.3  262.48  637.815  232.503  
657.666  773.366  466.608  754.035  
341.845  245.408  163.417  212.897  
667.378  687.189  474.277  666.181  
518.451  899.594  343.431  881.08  

file2.txt

3  
1.52 6.878 9.5485  
341.845  245.408  163.417  212.897  
667.378  687.189  474.277  666.181  

file3.txt

4  
657.666  773.366  466.608  754.035  
341.845  245.408  163.417  212.897  
667.378  687.189  474.277  666.181  
518.451  899.594  343.431  881.08    

我的输出文件res.txt应该是:

res.txt

2  
341.845  245.408  163.417  212.897  
667.378  687.189  474.277  666.181    

在这里,我们有2条共同的行,因此应在第一行中打印。如何缩放多个文件?

我尝试编写一个python脚本来处理两个文件,但是我认为效率不高。我尝试的代码是:

import numpy as np

l1 = []
l2 = []

with open('matchings1_2.txt','r') as f1:
    for line in f1:
        line = line.split()
        l1.append(line)

with open('matchings2_3.txt','r') as f2:
    for line in f2:
        line = line.split()
        l2.append(line)


l1 = np.array(l1[1:]).astype(float)
l2 = np.array(l2[1:]).astype(float)
l = []

for r in l1:
    if r in l2:
        l.append(list(r))

l.insert(0,[len(l)])

with open('Result.txt','w') as f:
    for item in l:
        s = ""
        for i in range(len(item)):
            if (i != len(item) - 1):
                s += str(item[i]) + " "
            else:
                s += str(item[i])
        f.write("%s\n" % s)

解决方法

我写了一个较短的代码,希望它不会太复杂,我认为它已经完成了。

rows =[] # to store rows of all files in a nested list
file_names =["f1","f2","f3"] # names of text files
for file in file_names:
    f1 = open(file+".txt","r")
    temp =[] #to store rows of each file separately 
    for i in f1:
        s = i.rstrip() # removes next line character from both ends of each row
        if len(s)!=1: # to exclude first line of each row
            temp.append(s)
    rows.append(set(temp)) # storing as a set so that we can use intersection
    f1.close()

final_rows = rows[0] # initializing as rows of first files
for i in range(1,len(rows)):
    final_rows = final_rows.intersection(rows[i]) # repeated intersection

f1 = open("res.txt",'w')
f1.write(str(len(final_rows))+"\n") # storing the length of common rows
for i in final_rows:
    f1.write(i+"\n") #storing the common rows
f1.close()

如果所有文件都位于相同格式的同一目录中,则可以进行一些更改:

import os
file_names = os.listdir()# if this python file and text files are in same directory or use os.listdir("xyz/abc") incase they are in other directory
for file in file_names:
    f1 = open(file,"r") # use file instead of file+".txt"
,

如@Aryman的答案中所建议的那样,设置相交可能是实现此目标的方法。要将操作应用于未定义长度的序列,可以使用o x n

functools.reduce

其中from functools import reduce from pathlib import Path def lines(text_file): with open(text_file) as f: result = f.read().splitlines() return result unique_lines = (set(lines(file)[1:]) # exclude the first line for file in Path('folder').glob('file*.txt')) common_lines = reduce(lambda x,y: x & y,unique_lines) print(list(common_lines)) 等效于x & y。您也可以使用x.intersection(y)代替lambda。

输出:

operator.and_
,

我可以为下面的问题编写解决方案,我已经在下面粘贴了。所有评论,所以我希望它易于阅读:)

import os  # a library for accessing the os

all_rows = []  # to load all lines into
res = []  # to load result into
number_files = 0
path_to_files = "."  # you can use "." if your files are in the same directory as the .py file

for file in os.listdir(path_to_files):  # put your path to files here,lists all files in that directory
    if file.startswith("file") and file.endswith(".txt"):
        number_files += 1  # keep a count of number of files for later
        with open(file,"r") as f:
            content = f.readlines()  # read all lines
            content = [x.strip() for x in content]  # remove \n from lines
            all_rows.extend(content)  # add all items of content to all_rows without creating a 2d list
            f.close()
for i in range(1,int(all_rows[0]) + 1):  # all rows in first file
    if all_rows.count(all_rows[i]) == number_files:  # if row occurs in all files
        res.append(all_rows[i])  # append to res
res.insert(0,str(len(res)))  # insert number of rows into res
with open(os.path.join(path_to_files,"res.txt"),"w+") as r:  # create new file in directory called res.txt
    for row in res:  # for every row which all files have in common
        r.write(row + "\n")  # add newline character
    r.close()  # close file

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)> insert overwrite table dwd_trade_cart_add_inc > select data.id, > data.user_id, > data.course_id, > date_format(
错误1 hive (edu)> insert into huanhuan values(1,'haoge'); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive> show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 <configuration> <property> <name>yarn.nodemanager.res