我有2个for循环,一个又一个,我想以某种方式摆脱它们以提高代码速度.我从熊猫获得的数据帧如下所示(标题代表不同的公司,行代表不同的用户,1表示用户访问了该公司,否则为0):
100 200 300 400
0 1 1 0 1
1 1 1 1 0
我想比较数据集中的每一对公司,为此,我创建了一个包含所有公司ID的列表.该代码查看列表中的第一个公司(基础),然后与其他每个公司(对等)配对,因此第二个“ for”循环.我的代码如下:
def calculate_scores():
df_matrix = create_the_matrix(df)
print(df_matrix)
for base in list_of_companies:
counter = 0
for peer in list_of_companies:
counter += 1
if base == peer:
"do nothing"
else:
# Calculate first the denominator since we slice the big matrix
# In dataframes that only have accessed the base firm
denominator_df = df_matrix.loc[(df_matrix[base] == 1)]
denominator = denominator_df.sum(axis=1).values.tolist()
denominator = sum(denominator) - len(denominator)
# Calculate the numerator. This is done later because
# We slice up more the dataframe above by
# Filtering records which have been accessed by both the base and the peer firm
numerator_df = denominator_df.loc[(denominator_df[base] == 1) & (denominator_df[peer] == 1)]
numerator = len(numerator_df.index)
annual_search_fraction = numerator/denominator
print("Base: {} and Peer: {} ==> {}".format(base, peer, annual_search_fraction))
指标如下:
1)我尝试计算的指标将告诉我与其他所有搜索相比,一起搜索2家公司的次数.
2)代码是首先选择访问基础公司(denominator_df = df_matrix.loc [(df_matrix [base] == 1)])行的所有用户.然后,它计算分母,该分母计算基础公司和用户搜索到的任何其他公司之间有多少个唯一组合,由于我可以计算(用户)访问的公司数量,因此我可以减去1得到基础公司与其他公司之间的独特链接.
3)接下来,代码过滤前面的denominator_df,以仅选择访问基础和对等公司的行.由于我需要计算访问基础公司和对等公司的用户数,因此我使用以下命令:numerator = len(numerator_df.index)来计算行数,这将为我提供分子.
顶部数据框的预期输出如下:
Base: 100 and Peer: 200 ==> 0.5
Base: 100 and Peer: 300 ==> 0.25
Base: 100 and Peer: 400 ==> 0.25
Base: 200 and Peer: 100 ==> 0.5
Base: 200 and Peer: 300 ==> 0.25
Base: 200 and Peer: 400 ==> 0.25
Base: 300 and Peer: 100 ==> 0.5
Base: 300 and Peer: 200 ==> 0.5
Base: 300 and Peer: 400 ==> 0.0
Base: 400 and Peer: 100 ==> 0.5
Base: 400 and Peer: 200 ==> 0.5
Base: 400 and Peer: 300 ==> 0.0
4)进行健全性检查,以查看代码是否提供了正确的解决方案:1家基本公司与所有其他同行公司之间的所有指标必须总计为1.
任何建议或技巧,朝着哪个方向将不胜感激!
解决方法:
您可能正在寻找itertools.product().这是一个与您似乎想要执行的操作类似的示例:
import itertools
a = [ 'one', 'two', 'three' ]
for b in itertools.product( a, a ):
print( b )
('one', 'one')
('one', 'two')
('one', 'three')
('two', 'one')
('two', 'two')
('two', 'three')
('three', 'one')
('three', 'two')
('three', 'three')
或者您可以这样做:
for u,v in itertools.product( a, a ):
print( "%s %s"%(u, v) )
输出是
one one
one two
one three
two one
two two
two three
three one
three two
three three
如果您想要一个列表,可以这样做:
alist = list( itertools.product( a, a ) ) )
print( alist )
输出是
[('one', 'one'), ('one', 'two'), ('one', 'three'), ('two', 'one'), ('two', 'two'), ('two', 'three'), ('three', 'one'), ('three', 'two'), ('three', 'three')]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。